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NOVEL DNA GLYCOSYLASES AND THEIR USE 

This invention relates to new DNA-glycosylases , in 
5 particular new cytosine-, thymine- and uracil-DNA 

glycosylases, and their use for mutagenesis, for DNA 
modification and cell killing. 

Damage to DNA arises continually throughout the cell 
cycle and must be recognised and repaired prior to the 

10 next round of replication to maintain the genomic 
integrity of the cell . DNA base damage can be 
recognised and excised by the ATP-dependent nucleoside 
excision repair systems or by base excision repair 
systems exemplified by the DNA glycosylases. 

15 DNA glycosylases are enzymes that occur normally in 

cells . They release bases from DNA by cleaving the bond 
between deoxyribose and the base in DNA. Naturally 
occurring glycosylases remove damaged or incorrectly 
placed bases. This base excision repair pathway is the 

20 major cellular defence mechanism against spontaneous DNA 
damage . 

DNA glycosylases which have been identified are 
directed to specific bases or modified bases . An 
example of a DNA glycosylase which recognizes an 

25 unmodified base is uracil DNA glycosylase (UDG) , which 

specifically recognises uracil in DNA and initiates base 
excision repair by hydrolysing the N-Cl' glycosylic bond 
linking the uracil base to the deoxyribose sugar. This 
creates an abasic site that is removed by a 5 1 -acting 

30 apurinic/apyritnidic (AP) endonuclease and a 

deoxyribophosphodie st erase, leaving a gap which is 
filled by DNA polymerase and closed by DNA ligase. 

The activity of UDG serves to remove uracil which 
arises in DNA as a result of incorporation of dUMP 

35 instead of dTMP during replication or from the 

spontaneous deamination of cytosine. Deamination of 
cytosine to uracil creates a premutagenic U:G mismatch 
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that, unless repaired, will cause a GC AT transition 
mutation. 

jp vivo , UDGs specifically recognise and remove 
uracil from within DNA and cleave the glycosylic bond to 
5 initiate the uracil excision pathway. In vitrQ, UDG's 
can recognise and remove uracil from both single 
stranded DNA (ssDNA) and double- stranded DNA (dsDMA) 
substrates . 

UDGs are ubiquitous enzymes and have been isolated 
10 from a number of sources. Amino acid sequencing reveals 
that the enzymes are conserved throughout evolution with 
greater than about 55% amino acid identity between human 
and bacterial proteins. A cDNA for human UDG has been 
cloned and the corresponding gene has been named UNG 
15 (Olsen et al . (1989) EMBO J., 8: 3121-3125). 

The crystal structures of the human enzyme (Mol et 
al., (1995) Cell, 80: 869-878) and the herpes simplex 
virus enzymes (Sawa et al. (1995) Nature, 373: 487-493) 
have recently been determined and reveal that uracil 
20 binds in a rigid pocket at the base of the DNA binding 
groove of human UDG. The absolute specificity of the 
enzyme for uracil over the structurally related DNA 
bases thymine and cytosine is conferred by shape 
complementarity, as well as main chain and side chain 
25 hydrogen bonds . 

Although UDG's do not have activity against other 
bases as a result of the afore-mentioned specific 
spatial and charge characteristics of the active site, 
other glycosylases with different activities have been 
30 identified, which may or may not be restricted to single 
substrates . 

A naturally-occurring thymine -DNA glycosylase has 
been identified which in addition to releasing thymine 
also releases uracil (Nedderman & Jiricny (1993) J. 
35 Biol. Chem., 268: 21218-21114 ; Nedderman & Jiricny 

(1994) J. Proc. Natl. Acad. Sci. U.S.A., 91: 1642-1646). 
This thymine-DNA glycosylase however has activity in 
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respect of only certain substrates and has an absolute 
requirement for a mismatched U or T opposite of a G in a 
double- stranded substrate and will not recognise T or U 
from T(U) :A matches or a single- stranded substrate. DNA 
5 glycosylases which recognize and release unmodified 
bases other than uracil and thymine (in certain 
substrates, as mentioned above) have not been 
identified. 

A DNA glycosylase recognizing unmodified cytosine 

10 has not been reported, although a 5- 

hydroxymethylcytosine-DNA glycosylase activity was 
detected in mammalian cells (Cannon et al . (1988) 
Biochem. Biophys . Res. Comm., 151: 1173-1179). The 
sequences of the afore-mentioned thymine and 5- 

15 hydroxymethylcytosine DNA glycosylases have not yet been 
reported and it is unknown whether their active site may 
be structurally related to UDG. 

It has now surprisingly been found that the 
substitution of certain of the UDG amino acids has a 

20 profound effect on the substrate specificity of the 

glycosylase. In particular, the replacement of Asn204 
by Asp204 results in the production of a mutant enzyme 
which has acquired cytosine-DNA glycosylase (CDG) 
activity, while retaining some UDG-activity . 

25 Alternatively, replacing Tyrl47 with Alal47 allows for 
binding of thymine, resulting in an enzyme that has 
acquired thymine-DNA glycosylase (IDG) activity. 

These new DNA glycosylases are not product -inhibited 
by added uracil, in contrast to UDG and other UDG- 

30 mutants. Compared with the efficiency of wild type UDG 
in removal of uracil, the activity of the new DNA 
glycosylases that remove normal pyrimidines in DNA is 
low, but distinct and easily detectable. However, it 
should be noted that the very high turnover of UDG 

35 appears to be unique among DNA glycosylases and turnover 
numbers of other DNA glycosylases may be as low, or even 
lower than those of the engineered glycosylases CDG and 



* 



WO 97/2541 6 PCT/GB97/00057 

- 4 - 



TDG. This may result from the narrow substrate 
specificity of UDG. 

Furthermore, an additional new UDG has been 
identified. The complete sequence of the UNG gene was 
5 recently published (Haug et al . , 1996, Genomics, 36, 
p408-4l6) . As mentioned previously, cDNA to this UNG 
gene has been identified by Olsen et al . , 1989, supra 
(hereinafter referred to as UNG1 cDNA and the expressed 
protein referred to as UNG1) . It has now surprisingly 
10 been found that alternative splicing of the genomic DNA 
(UNG) with an exon located 5' of exon 1 which was not 
previously recognized results in a new distinct cDNA 
with an open reading frame of 313 amino acids. The new 
UNG cDNA is referred to hereinafter as UNG2 cDNA, and 
15 the product which it encodes, UNG2 . The latter protein 
has a predicted size of 36kDa. 

UNG2 differs from the previously known form (UNG1, 
ORF 304 amino acid residues) in the 44 amino acids of 
the N- terminal presequence, which is not necessary for 
20 catalytic activity. The rest of the presequence and the 
catalytic domain, altogether 269 amino acids, are 
identical. The alternative presequence in UNG 2 arises 
by splicing of a previously unrecognized exon (exon 1A) 
into a consensus splice site after codon 35 in exon IB 
25 (previously designated exon 1) . The UNG1 presequence 

starts at codon 1 in exon IB and thus ha6 35 amino acids 
not present in UNG2 . Coupled transcription/translation 
in rabbit reticulocyte lysates demonstrated that both 
proteins are catalytically active. Similar forms of 
30 UNG1 and UNG 2 are expressed in mouse which has an 
identical organization of the homologous gene. 
Furthermore, the presequence of a putative Xlphophorus 
UNG2 protein predicted from the gene structure is 
homologous to mammalian UNG2, but much shorter, 
35 suggesting a very high degree of conservation from fish 
to man. 

The invention therefore provides a DNA glycosylase 
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capable of releasing cytosine bases from single stranded 
(ss) DNA and/or double stranded (ds) DNA or thymine 
bases from both single stranded (ss) DNA and double 
stranded (ds) DNA or from single stranded (ss) DNA or 
5 uracil bases from single stranded (ss) DNA and/or double 
stranded (ds) DNA, wherein said uracil-DNA glycosylase 
is encoded by a nucleic acid molecule comprising the 
sequence (SEQUENCE I.D. Nos 1 and 2): 

10 1 CACAGCCACA GCCAGGGCTA GCCTCGCCGG TTCCCGGGTG GCGCGCGTTC GCTGCCTCCT 

61 CAGCTCCAGG ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG 
MIG QKT LYSF F S P SPA 

15 121 GAAGCGACAC GCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGGCGTGG CTGGGGTGCC 

RKRH APS PEP AVQG TGV A G V 

181 TGAGGAAAGC GGAGATGCGG CGGCCATCCC AG CCAAG AAG GCCCCGGCTG GGCAGGAGGA 
PEES GDA A A I PARK A P A GQE 



20 



35 



241 GCCTGGGACG CCGCCCTCCT CGCCGCTGAG TGCCGAGCAG TTGGACCGGA TCCAGAGGAA 
EPGT PPS SPL SAEQ L D R IQR 



3 01 CAAGG CCGCG GCCCTGCTCA GACTCGCGGC CCG CAACGTG CCCGTGGG CT TTGGAGAGAG 
25 NKAA ALL RLA ARNV PVG FGE 

361 CTGGAAGAAG CACCTCAGCG GGGAGTTCGG GAAACCGTAT TTTATCAAGC TAATGGGATT 
SWKK HLS GEF GKPY FIK LMG 

30 421 TGTTGCAGAA GAAAGAAAGC ATTACACTGT TTATCCACCC CCACACCAAG TCTTCACCTG 

FVAE ERK HYT VYPP PHQ VFT 

4 81 GACCCAGATG TGTGACATAA AAGATGTGAA GGTTGTCATC CTGGGACAGG ATCC AT AT C A 
WTQM CDI KDV KVVI L G Q DPY 



541 TGGACCTAAT CAAGCTCACG GGCTCTGCTT TAGTGTTCAA AGGCCTGTTC CGCCTCCGCC 
HGPN QAH GLC FSVQ RPV PPP 



601 CAGTTTGGAG AACATTTATA AAGAGTTGTC TACAGACATA GAGGATTTTG TTCATCCTGG 
40 PSLE NIY KEL STDI EDF VHP 

661 CCATGGAGAT TTATCTGGGT GGGCCAAGCA AGGTGTTCTC CTTCTCAACG CTGTCCTCAC 
GHGD LSG WAK QGVL LLN AVL 

45 721 GGTTCGTGCC CATCAAGCCA ACTCTCATAA GGAGCGAGGC TGGGAGCAGT TCACTGATGC 

TVRA H 0 A NSH KERG W E Q FTD 

781 AGTTGTGTCC TGGCTAAATC AGAACTCGAA TGGCCTTGTT TTCTTGCTCT GGGGCTCTTA 
AVVS WLN QNS NGLV FLL WGS 
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841 TGCTCAGAAG AAGGGCAGTG CCATTGATAG GAAGCGGCAC CATGTACTAC AGACGGCTCA 

YAQK KGS AID RKRH H V L QTA 

901 TCCCTCCCCT TTGTCAGTGT ATAGAGGGTT CTTTGGATGT AGACACTTTT CAAAGACCAA 
5 HPSP LSV YRG FFGC RHF SKT 



961 TGAGCTGCTG CAGAAGTCTG GCAAGAAGCC CATTGACTGG AAGGAGCTGT GAT CATC AG C 







NELL 


Q K S 


G K K P I D W 


K E L 




10 


1021 


TGAGGGGTGG 


CCTTTGAGAA 


GCTGCTGTTA 


ACGTATTTGC 


CAGTTACGAA 


GTTCCACTGA 




1081 


AAATTTTCCT 


ATTAATTCTT 


AAGTACTCTG 


C AT AAGGGG G 


AAAAGCTTCC 


AG AAAG C AG C 




1141 


CATGAACCAG 


GCTGTCCAGG 


AATGGCAGCT 


GTATCCAACC 


ACAAACAACA 


AAGGCTACCC 


15 


1201 


TTTGACCAAA 


TGTCTTTCTC 


TGCAACATGG 


CTTCGGCCTA 


AAATATGCAG 


AAGACAGATG 




1261 


AGGTCAAATA 


CTCAGTTGGC 


TCTCTTTATC 


TCCCTTGCCT 


TTATGGTGAA 


ACAGGGGAGA 


20 


1321 


TGTGCACCTT 


TCAGGCACAG 


CCCTAGTTTG 


GCGCCTGCTG 


CTCCTTGGTT 


TTGCCTGGTT 




1381 


AGACTTTCAG 


TGACAGATGT 


TGGGGTGTTT 


TTGCTTAGAA 


AGGTCCCCTT 


GTCTCAGCCT 




1441 


TGCAGGGCAG 


GCATGCCAGT 


CTCTGCCAGT 


TCCACTGCCC 


CCTTGATCTT 


TGAAGGAGTC 


25 


1501 


CTCAGGCCCC 


TCGCAGCATA 


f\\3\M*% X VJ X X i. A 


G C AACTTTC C 


AG AATCTGG C 


CCAGAAATTA 




1561 


GGGCTCAATT 


TCCTGATTGT 


AGTAGAGGTT 


AAGATTGCTG 


TGAGCTTTAT 


CAGATAAGAG 


30 


1621 


ACCGAGAGAA 


GTAAGCTGGG 


TCTTGTTATT 


CCTTGGGTGT 


TGGTGGAATA 


AG C AGTGG AA 




1681 


TTTGAACAAG 


GAAGAGGAGA 


AAAGGGAATT 


TTGTCTTTAT 


GGGGTGGGGT 


GATTTTCTCC 




1741 


TAGGGTTATG 


TCCAGTTGGG 


GTTTTTAAGG 


CAGCACAGAC 


TGCCAAGTAC 


TGTTTTTTTT 


35 


1801 


AACCGACTGA 


AAT C ACTTTG 


GGATATTTTT 


TCCTGCAACA 


CTGGAAAGTT 


TTAGTTTTTT 




1861 


AAGAAGTACT 


CATGCAGATA 


TATATATATA 


TATTTTTCCC 


AGTCCTTTTT 


TTAAGAGACG 


40 


1921 


GTCTTTATTG 


GGTCTGCACC 


TCCATCCTTG 


ATCTTGTTAG 


CAATGCTGTT 


TTTG CTGTT A 




1981 


GTCGGGTTAG 


AGTTGGCTCT 


ACGCGAGGTT 


TGTTAATAAA 


AGTTTGTTAA 


AAGTTCAAAA 


45 


2041 


AAAAAAAAAA 


AAA 












or i 


a fragment 


thereof 


encoding 


a catalytically active 



product comprising at least nucleotides 121 to 130, 
preferably 71 to 202 in addition to the catalytic 
domain, or a sequence which is degenerate, substantially 
50 homologous with or which hybridizes with at least 

nucleotides 121 to 130, preferably 71 to 202 of any such 
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aforesaid sequence . 

In particular, viewed from one aspect, the invention 
can be seen as providing a cytosine-DNA glycosylase 
(CDG) capable of releasing cytosine bases from ssDNA 
5 and/or dsDNA. 

A further aspect of the invention provides a 
cytosine-DNA glycosylase (CDG) capable of releasing both 
cytosine and uracil bases from ssDNA and/or dsDNA. 
Preferably, the cytosine-DNA glycosylase is one 

10 derived from a UDG and especially from the human UDG 
protein which has Asn at amino acid position 204. In 
particular, the novel CDG of the invention is preferably 
derived from human UDG and has an amino acid 
substitution or modification at position 204. 

15 Modification of UDG from other species at an equivalent 
residue is similarly preferred. Especially preferably, 
the glycosylase is human UDG having an aspartic acid 
residue (Asp) at position 204. 

Another aspect of the invention provides a thymine- 

20 DNA glycosylase (TDG) capable of releasing thymine bases 
from both ssDNA and dsDNA. 

A further aspect of the invention provides a 
thymine -DNA glycosylase (TDG) capable of releasing both 
thymine and uracil bases from both ssDNA and dsDNA. 

25 Yet further aspects of the invention provide a 

thymine-DNA glycosylase (TDG) capable of releasing 
thymine bases from A : T DNA pairs and a thymine-DNA 
glycosylase (TDG) capable of releasing thymine bases 
from single stranded DNA. 

30 Preferably, the thymine-DNA glycosylase is one 

derived from a UDG, and especially from the human UDG 
protein which has Tyr at amino acid position 147 . in 
particular, the novel CDG of the invention is preferably 
derived from human UDG and has an amino acid 

35 substitution or modification at position 147. 

Modification of UDG from other species at an equivalent 
residue is similarly preferred. Especially preferably, 
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the glycosylase is human UDG having a alanine residue 
(Ala) at position 147. 

A yet further aspect of the invention provides a 
uracil -DNA glycosylase encoded by a nucleic acid 
5 molecule comprising the sequence (SEQUENCE I.D Nos 1 and 
2) : 

1 CACAGCCACA GCCAGGGCTA GCCTCGCCGG TTCCCGGGTG GCGCGCGTTC GCTGCCTCCT 

10 €1 CAGCTCCAGG ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG 

MIG QKT LYSF FSP SPA 

121 GAAGCGACAC GCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGGCGTGG CTGGGGTGCC 

RKRH APS PEP AVQG TGV AGV 

15 

181 TG AGGAAAG C GGAGATGCGG CGGCCATCCC AGCCAAGAAG GCCCCGGCTG GGCAGGAGGA 

PEES GDA A A I PARK APA GQE 

241 GCCTGGGACG CCGCCCTCCT CGCCGCTGAG TGCCGAGCAG TTGGACCGGA TCCAGAGGAA 
20 EPGT PPS SPL SAEQ LDR IQR 

301 CAAGGCCGCG GCCCTGCTCA GACTCGCGGC CCGCAACGTG CCCGTGGGCT TTGGAGAGAG 
NKAA ALL RLA ARNV PVG FGE 

25 361 CTGGAAGAAG CACCTCAGCG GGGAGTTCGG GAAACCGTAT TTTATCAAGC TAATGGGATT 

SWKK HLS GEF GKPY FIK LMG 

421 TGTTGCAGAA GAAAGAAAGC ATTACACTGT TTATCCACCC CCACACCAAG TCTTCACCTG 
FVAE ERK HYT VYPP PHQ VFT 

481 GACCCAGATG TGTGACATAA AAGATGTGAA GGTTGTCATC CTGGGACAGG ATCCATATCA 
WTQM CDI KDV KVVI LGQ DPY 

541 TGGACCTAAT CAAGCTCACG GGCTCTGCTT TAGTGTTCAA AGGCCTGTTC CGCCTCCGCC 
HGPN QAH GLC FSVQ RPV PPP 

601 CAGTTTGGAG AACATTTATA AAGAGTTGTC TACAGACATA GAGGATTTTG TTCATCCTGG 
PSLE NIY KEL STDI EDF VHP 

40 661 CCATGGAGAT TTATCTGGGT GGGCCAAGCA AGGTGTTCTC CTTCTCAACG CTGTCCTCAC 

GHGD LSG WAK QGVL LLN AVL 

721 GGTTCGTGCC CATCAAGCCA ACTCTCATAA GGAGCGAGGC TGGGAGCAGT TCACTGATGC 
TVRA HQA NSH KERG WEQ FTD 

45 

781 AGTTGTGTCC TGGCTAAATC AGAACTCGAA TGG CCTTGTT TTCTTGCTCT GGGGCTCTTA 
AVVS WLN QNS N G L V FLL WGS 

841 TGCTCAGAAG AAGGGCAGTG CCATTGATAG GAAGCGGCAC CATGTACTAC AGACGGCTCA 
50 YAQK KGS AID RKRH HVL QTA 



30 



35 
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901 TCCCTCCCCT TTGTCAGTGT ATAGAGGGTT CTTTGGATGT AGACACTTTT CAAAGACCAA 
HPSP L S V YRG FFGC RHF SKT 

961 TGAGCTGCTG CAGAAGTCTG GCAAGAAGCC CATTGACTGG AAGGAGCTGT GATCATCAGC 
NELL QKS GKK PIDW KEL 

1021 TGAGGGGTGG CCTTTGAGAA GCTGCTGTTA ACGTATTTGC CAGTTACGAA GTTCCACTGA 

1081 AAATTTTCCT ATTAATTCTT AAGTACTCTG CATAAGGGGG AAAAGCTTCC AGAAAGCAGC 

1141 CATGAACCAG GCTGTCCAGG AATGGCAGCT GTATCCAACC ACAAACAACA AAGGCTACCC 

12 01 TTTGACCAAA TGTCTTTCTC TGCAACATGG CTTCGGCCTA AAATATGCAG AAGACAGATG 
15 12 61 AGGTCAAATA CTCAGTTGGC TCTCTTTATC TCCCTTGCCT TTATGGTGAA ACAGGGGAGA 

1321 TGTGCACCTT TCAGGCACAG CCCTAGTTTG GCGCCTGCTG CTCCTTGGTT TTGCCTGGTT 

13 81 AGACTTTCAG TGACAGATGT TGGGGTGTTT TTGCTTAGAA AGGTCCCCTT GTCTCAGCCT 

20 

1441 TGCAGGGCAG GCATGCCAGT CTCTGCCAGT TCCACTGCCC CCTTGATCTT TGAAGGAGTC 

15 01 CTCAGGCCCC TCGCAGCATA AGGATGTTTT GCAACTTTCC AGAATCTGGC CCAGAAATTA 
25 1561 GGGCTCAATT TCCTGATTGT AGTAGAGGTT AAGATTGCTG TGAGCTTTAT CAGATAAGAG 

1621 AC CGAGAGAA GTAAGCTGGG TCTTGTTATT CCTTGGGTGT TGGTGGAATA AGCAGTGGAA 

16 81 TTTGAACAAG GAAGAGGAGA AAAGGGAATT TTGTCTTTAT GGGGTGGGGT GATTTTCTCC 

30 

1741 TAGGGTTATG TCCAGTTGGG GTTTTTAAGG CAG CACAG AC TGCCAAGTAC TGTTTTTTTT 

18 01 AACCGACTGA AATCACTTTG GGATATTTTT T CCTGCAAC A CTGGAAAGTT TTAGTT' I TTT 

35 18 61 AAGAAGTACT CATGCAGATA TATATATATA TATTTTTCCC AGT CCTTTTT TTAAGAGACG 

1921 GTCTTTATTG GGTCTGCACC TCCATCCTTG AT CTTGTT AG CAATGCTGTT TTTGCTGTTA 

1981 GTCGGGTTAG AGTTGGCTCT ACGCGAGGTT TGTTAATAAA AGTTTGTTAA AAGTTCAAAA 

2041 AAAAAAAAAA AAA 



40 



or a fragment thereof encoding a catalytically active 
product comprising at least nucleotides 121 to 130, 

45 preferably 71 to 202 in addition to the catalytic 

domain, or a sequence which is degenerate, substantially 
homologous with or which hybridizes with at least 
nucleotides 121 to 130, preferably 71 to 202 of any such 
aforesaid sequence. Preferably such degeneracy, 

50 homology or hybridization applies to the entire 
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sequence . 

"Catalytically active product" as used herein refers 
to any product encoded by said sequence which exhibits 
uracil DNA glycosylase activity. 
5 "Substantially homologous" as used herein includes 

those sequences having a sequence homology of 
approximately 60% or more, eg. 70% or 80% or more, and 
also functionally-equivalent allelic variants and 
related sequences modified by single or multiple base 
10 substitution, addition and/or deletion. By 

"functionally equivalent" in this sense is meant 
nucleotide sequences which encode catalytically active 
polypeptides, ie . having uracil DNA glycosylase 
activity. 

15 Sequences which "hybridize" are those sequences 

binding under non-stringent conditions (eg. 6 x SSC 50% 
formamide at room temperature) and washed under 
conditions of low stringency (eg. 2 x SSC, room 
temperature, more preferably 2 x SSC, 42 # C) or 
20 conditions of higher stringency (eg. 2 x SSC, 65°C) 

(where SSC = 0.15M NaCl , 0.015M sodium citrate, pH 7.2) . 
Generally speaking, sequences which hybridize under 
conditions of high stringency are included within the 
scope of the invention, as are sequences which, but for 
25 the degeneracy of the code, would hybridize under high 
stringency conditions . 

The signif icance of the UNG1, UNG2 presequences has 
also been investigated in the present invention, by the 
use of constructs that express fusion products of UNG1 
30 or UNG2 and green fluorescent protein (EGFP) . 

Surprisingly, significant effects on subcellular 
targeting were observed and after transient transfection 
of HeLa cells, the pUNGl-EGFP-Nl product co-localized 
with mitochondria whereas the pUNG2-EGFP-Nl product 
35 targeted exclusively to nuclei. Whilst not wishing to 
be bound by theory, it appears that these sequences may 
be instrumental in the localization of the enzymes. The 
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putative nuclear signal was identified as RKRH which 
also appears in the catalytic domain of both UNG1 and 
UNG2 . Whilst it was recognized previously by Slupphaug 
et al., 1993, Nucl . Acids Res., 21(11), p2579-2584, that 
5 the signal for mitochondrial translocation resides in 
the UNG1 presequence, it was believed that the signal 
for nuclear import lay within the mature protein as in 
the absence of the presequence, UNG1 was transported to 
the nucleus. However, UNG2 has now been identified 

10 which has a presequence and which localizes to the 
nucleus . These presequences thus have utility for 
directing the subcellular localization of molecules 
attached to them. 

Thus, viewed from a further aspect, the invention 

15 provides nuclear localization peptides encoded by a 

nucleic acid molecule comprising the sequence (SEQUENCE 
I.D. Nos 3 and 4) : 

ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG 
20 M I G QKT LYSF FSP SPA 

GAAGCGACAC GCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGGCGTGG CTGGGGTGCC 
RKRH APS PEP AVQG TGV AGV 

25 TGAGGAAAGC GGAGATGCGG CG 

PEES GDA A 



or a fragment thereof encoding a functional equivalent 
3 0 or a sequence which is degenerate, substantially 

homologous with or which hybridizes with any such 

aforesaid sequence. 

Functionally equivalent fragments refer to products 

which may serve as appropriate localization peptides. 
35 Especially preferred nuclear localizing peptides are 

those which include the amino acid sequence RKRH. 
A further preferred feature of the invention 

comprises DNA glycosylases of the invention which 

additionally comprise at least one of the aforesaid 
40 nuclear localization peptide sequences or at least one 
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mitochondrial localization peptide sequence encoded by a 
nucleic acid molecule comprising the sequence (SEQUENCE 
I .D. Nos 5 and 6) : 

5 ATGGGCGTCT TCTGCCTTGG GCCGTGGGGG TTGGGCCGGA AG CTG CGGAC GCCTGGGAAG 

MGV FCL GPWG LGR KLR TPGK 

GGGCCGCTGC AGCTCTTGAG CCGCCTCTGC GGGGACCACT TGCAG 
GPL QLL SRLC GDH L Q 

10 

or a fragment thereof encoding a functional equivalent 
or a sequence which is degenerate, substantially 
homologous with or which hybridizes with any such 
aforesaid sequence, e.g. CDG or TDG with a localization 

15 peptide. Such a composite may be prepared for example 
by appropriate modification of UNG1 or UNG2 . 

The novel DNA glycosylases of the invention 
conveniently may be obtained by modification of existing 
DNA glycosylase enzymes, such as the human UDG mentioned 

20 above. Such modification, for example by replacement, 

addition or deletion of one or more amino acid residues, 
or indeed chemical modification of amino acid residues, 
may readily be achieved using methods well known in the 
art and include modifications both at the protein level 

25 and also at the level of the encoding nucleic acid. For 
example, site-directed mutagenesis techniques are widely 
described in the literature. Other conventional 
mutagenesis treatments which may be used to obtain 
enzymes according to the invention include random or 

30 regional random mutagenesis by chemical agents, such as 
N-nitroso compounds, or physical agents, such as 
ultraviolet light, as well as random or regional random 
mutagenesis by polymerase chain reaction (PGR) methods. 
Regional random mutagenesis may be carried out by 

35 subcloning one or more relevant DNA sequences encoding 
segments of the starting protein e.g. UDG, followed by 
random mutagenesis on this fragment or fragments. After 
the fragments have been mutagenized they may be 
reinserted into a DNA sequence encoding the starting 
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protein e.g. UDG. Screening of individual colonies for 
novel DNA glycosylases of the invention may then be 
performed using assay methods described herein. 

Alternatively, the novel DNA glycosylases of the 
5 invention may be obtained by other techniques, for 

example polypeptide synthesis, construction of fusion 
proteins etc. 

DNA glycosylase activity may readily be assayed 
according to techniques well known in the art, see for 

10 example Slupphaug et al . (1995) Biochemistry, 34: 128- 
138, and Nedderman & Jiricny, supra. Assays for DNA 
glycosylase may be used for identifying enzymes 
according to the invention. The enzymes may be 
naturally occurring or formed as the result of 

15 manipulations of naturally occurring gene sequences or 

products. Thus, for example, a cell -free extract may be 
assayed using a thymine or cytosine- containing substrate 
to identify enzymes which perform excision of one or 
more of the bases. For the purposes of assessment, the 

20 cytosine and thymine bases in the substrates are 
conveniently labelled, for example fluorescent or 
radiolabelled e.g. with 3 H. Suitable substrates may be 
prepared by methods known in the art e.g. by nick 
translation, random priming, PGR or chemical synthesis. 

25 To ascertain if the enzymes are also capable of excising 
uracil, substrates including uracil may also be used. 
Conveniently, the uracil bases should be labelled to 
allow detection. Assays for the excision of different 
bases are preferably performed independently. 

30 Thus, viewed from a yet further aspect, the 

invention provides an assay for the identification of 
DNA glycosylases of the invention in a sample, in which 
said assay comprises at least the step of assaying for 
activity in the sample which is capable of excising 

35 thymine or cytosine and optionally also uracil from an 
introduced ssDNA and/or dsDNA substrate. Optionally, 
the moiety responsible for such activity may be 
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isolated. Suitable assays are described herein and are 
also known in the art . 

DNA glycosylases of the invention include 
modifications of human UDG by amino acid replacement, as 
5 mentioned above, especially at positions 204 and 147. 
Such amino acid- substituted mutants of human UDG may 
also comprise additional modifications, for example 
truncation from the N- and/or C- terminal, or chemical 
derivation of amino acid residues and/or addition, 
10 deletion or mutation of constituent residues which do 
not affect the overall specificity of the enzyme. 

Derivatives of UDG or other DNA glycosylase enzymes 
from other genera or species, having the CDG or TDG 
functional activity mentioned above, are also included 
15 within the scope of the invention. It will be 

appreciated that appropriate modification of such 
enzymes would be performed on comparable residues to 
those in the human enzyme which form part of the active 
site and which could be identified by methods known in 
20 the art, e.g. by sequence comparison to human UDG and/or 
by mutation of residues which are identified as 
potentially conferring specificity to the enzyme and 
subsequent substrate specificity analyses of the mutant 
enzymes thus obtained. 
25 The novel DNA glycosylases of the invention may have 

a number of uses, for example as tools in molecular 
biology procedures, most notably in mutagenesis, both in 
vitro and in vivo , but also in other areas such as cell 
killing, removal of contaminating DNA, random 
30 degradation of DNA, enzymatic DNA sequencing etc. 

In light of the identification of mitochondrial and 
nuclear localizing peptides, it is now possible to 
direct human uracil -DNA glycosylase either to nuclei or 
to mitochondria by making constructs containing either a 
35 nuclear localization signal, such as in UNG2 , or a 

mitochondrial localization signal, such as in UNG1, as 
mentioned above. Whilst this alone may be used to 
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mutate RNA in the cells, this is particularly useful in 
combination with site directed mutations that give rise 
to mutants that have either TDG activity or CDG activity 
because it allows for selective mutagenesis of nuclear 
5 DNA or mitochondrial DNA. Furthermore, it is useful in 
a system where either nuclear or mitochondrial DNA is 
the target for degradation for the purpose of killing 
cells, eg. cancer cells. 

As mentioned above, DNA glycosylases according to 

10 the invention may be used in a mutagenesis system both 
in YitZO and in Vivp. These proteins have numerous 
advantages over typical chemical mutagens, particularly 
regarding their ease of use. Small molecular mutagens, 
such as methylnitrosurea (MNU) , methylmethanesulf onate 

15 (MMS) or methylnitrosoguanidine (MNNG) are very toxic on 
contact with eyes, skin or mucosal membranes and may 
decompose to explosive and volatile toxic compounds . 
Other mutagens, such as dimethylnitrosamine and 
benzo(a)pyrene require metabolic activation by special 

20 enzymes that are only present in some cells. They can 
therefore only be used under certain experimental 
conditions and will often require the addition of a 
fraction containing activating enzymes. All these 
chemical mutagens therefore require specialised 

25 precautions in order to protect the user. One major 

advantage of DNA glycosylases according to the invention 
is that they are not volatile and are not harmful to the 
user, for example, by mere skin contact. 

Mutagenesis in vi^rn may be performed on a complex 

3 0 sample, e.g. a cell -free extract, a partially refined 

sample, e.g. nucleic-acid enriched or purified sample or 
on a single population of nucleic acid material, e.g. 
amplified nucleic acid material. Random mutation may be 
performed using selected DNA glycosylases of the 

35 invention (possibly in combination with one another 
and/or with known DNA glycosylases) , to release 
particular bases or combinations of bases from the 
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nucleic acid substrate. Removal of the resulting abasic 
site and replacement of the removed base with another 
base may be performed by provision of appropriate 
enzymes and bases . 
5 Specific mutagenesis may be performed in a number of 

ways. Depending on the specificity of the DNA 
glycosylase for ssDNA or dsDNA, either one or the other 
type of DNA may be targeted. One application of such a 
method may be to introduce labelled bases into the 
10 target DNA to identify its presence or amount in the 
total nucleic acid material. Alternatively, the 
substrate which is uniquely recognizable (e.g. dsDNA) 
may be made sensitive to digestion or degradation after 
release of the appropriate base by DNA glycosylase 
15 activity when replacement of the base has not been 

performed. This may then be used to remove certain ss- 
or ds-DNA from a sample. Such an application is 
discussed in more detail hereinafter. 

Another application involves the introduction of 
20 selected bases after release of the specific bases 
recognized by the DNA glycosylase. In this way, 
replacement of specific bases by specific other bases 
may be performed. It is known from the art that the 
human UDG has sequence specificity for uracil excision 
25 in the sequence surrounding the uracil base (Slupphaug 
et al., 1995, supra). Appropriate selection of enzyme 
concentrations and other determinants may be employed to 
excise specific bases from known sequences or 
alternatively, by replacement with appropriately 
30 labelled bases, to determine the presence of such 
sequences in nucleic acid samples . 

For mutagenesis in vivo, e.g. in a cell, a 
nucleotide sequence encoding a DNA glycosylase according 
to the invention under the control of an suitable 
35 expression vector may be introduced into the cell by any 
suitable means, for example, by transformation or 
through the use of liposomes. 
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A further aspect of the present invention thus 
provides a nucleic acid molecule comprising a nucleotide 
sequence which encodes a DNA glycosylase and/or nuclear 
localizing peptide of the invention as defined above. 
5 Such nucleic acid molecules may readily be prepared 
using conventional techniques well known in the art. 
Thus, for example, as already mentioned above, known 
gene sequences coding for DNA glycosylases , e.g. the UNG 
gene mentioned above, may be modified e.g. by nucleic 

10 acid substitution using standard techniques such as 
site-directed mutagenesis. 

In further aspects the invention also provides an 
expression vector containing a nucleic acid molecule of 
the invention, and transformed or transfected host cells 

15 carrying a nucleic acid molecule of the invention. 

The expression vector may be any conventional 
expression vector known in the art or described in the 
literature, including both phage and plasmid vectors. 
In general , these will comprise suitable regulatory 

20 sequences e.g. a promoter and/or enhancer operably 

connected to a gene expressing the enzyme. Suitable 
promoters include SV40 early or late promoter, e.g. PSVL 
vector, cytomegalovirus (CMV) promoter and mouse mammary 
tumour virus long terminal repeat, although preferably 

25 inducible promoters are used, e.g. mouse metallothionein 
I promoter. The vector preferably includes a suitable 
marker such as a gene for dihydrof olate reductase or 
glut amine synthetase. The expression vector may for 
example be an inducible vector, such as the E . col i 

30 vector pTrc99A (See Slupphaug (supra)) inducible with 
isopropyl p-D-thiogalactopyranoside (IPTG) . Other 
suitable expression vectors include any vector carrying 
an inducible promoter, such as lac, or bacteriophage 
lambda \ P^, in which the promoter is under the control 

35 of a temperature sensitive repressor Id) . Examples of 
such vectors are pKK223-2 and pP L .Lambda Inducible (from 
Pharmacia) . The DNA glycosylases of the invention may 
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also be expressed as fusion proteins. The expression of 
such fusion proteins may facilitate purification e.g. by 
using a system such as the GST-gene fusion systems, 
exemplified by the pGEX vector systems (Pharmacia) or 
5 the fusion proteins with peptide sequences that are 

recognized by specific antibodies, exemplified by the 
FLAG Expression vectors (Kodak) . 

The host cell may likewise be any suitable host cell 
known in the art, including both eukaryotic e.g. yeast, 
10 mammalian and plant cells, and prokaryotic cells, e.g. 
bacteria. 

Transfection and transformation techniques are also 
well known in the art as described for example in 
Sambrook et al . (1989), Molecular Cloning : A laboratory 

15 manual, 2nd Ed., Cold Spring Harbor Laboratory Press, 

Cold Spring harbor, N.Y.) as indeed are other techniques 
for introducing nucleic acids into cells, for example 
using calcium phosphate, DEAE dextran, polybrene, 
protoplast fusion, liposomes, electroporation, direct 

20 microinjection, gene cannon etc. 

Expression of the DNA glycosylase according to the 
invention results in the release of C or T from the 
cellular DNA, which may lead to transition mutations 
upon replication. 

25 Mutagenesis of cells, e.g. mammalian cells, may also 

be performed by introduction of the DNA glycosylase 
protein of the invention into the cell . This may be 
performed using for example liposomes or other 
appropriate techniques known in the art . 

30 TDG or CDG may also be used to specifically induce 

mutations either in the cell nucleus or mitochondria of 
eukaryotic cells. This may be carried out by expressing 
cDNA with the complete open reading frame of UNG2 , but 
with a site directed mutation in codon 204 (preferably 

35 Asn204Asp) or in codon 147 (preferably Tyrl47Ala) , in 
which the N- terminal amino acid sequence contains a 
nuclear localization signal, as described previously, to 
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obtain mutations in the nuclear DNA, or by expressing a 
cDNA expressing the complete reading frame of UNG1, in 
which the N- terminal amino acid sequence contains a 
mitochondrial localization signal, as described 
5 previously, with similar site directed mutations to 
those mentioned above, to specifically obtain 
mitochondrial mutations. For this purpose any 
expression vector applicable to eukaryotic cells may be 
used, but preferably the vector system should be 

10 inducible. To introduce the expression vectors into the 
cells, any method for transfection my be used. 
Alternatively, the same proteins may be expressed and 
purified and then introduced into the cells by liposome 
technology or other appropriate techniques in the art as 

15 mentioned previously. 

Combined in vitro/in vivo mutagenesis may also be 
performed. For example, an isolated restriction 
fragment of interest (or possibly the whole plasmid) may 
be treated with limited amounts of cytosine-DNA 

20 glycosylase or thymine-DNA glycosylase. Subsequently, 

the treated fragment may be reinserted into a vector and 
transformed into E . coli cells (the cells may also be 
pre- treated with a DNA damaging agent no ensure an 
error-prone SOS-repair) . As a result of the 

25 mutagenicity of AP- sites, this should yield random 
mutations . 

The Examples below describe the induction of 
mutations in bacterial cells by the expression within 
such cells of a DNA glycosylase ie. a CDG or TDG 

30 according to the present invention. Expression of the 
DNA glycosylases of the invention in the transformed 
cells causes an increase in mutation frequencies . 
Similar results may be obtained with other cells. To 
enhance mutagenesis, strains may be used, including both 

35 prokaryotic and eukaryotic strains, which are defective 
in the repair of AP- sites or are otherwise 
hypermutatable e.g. bacterial mutants that are defective 
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in endonuclease IV or exonuclease III, or both, or other 
mutants that similarly enhance the yield of mutations. 

Thus, the use of one or more DNA glycosylases 
according to the invention in in vitro and/or in vivo 
5 mutagenesis systems provide yet further aspects of this 
invention. 

Another use of DNA glycosylases of the invention 
involves DNA modification. By treating any type of DNA 
(single or double- stranded) in vitro with a DNA 
10 glycosylase according to the invention, naturally- 
occurring C or T will be released, thus leaving an 
apyrimidic site (AP-site) . Subsequent treatment of this 
DNA with alkaline solutions or enzymes such as 
apurinic/apyrimidinic-site endonucleases (AP- 
IS endonucleases) recognising AP-sites will cause breaks in 
the DNA at the AP-sites. This method may therefore be 
used for the random cleavage of DNA. The number of 
cleaved sites will depend on the amount of the DNA 
glycosylases according to the invention used, thus 
20 allowing the number of AP-sites and hence breaks to be 

controlled. Uses of such methods include the removal of 
possible contaminating DNA prior to PCR amplif ication 
and for the enzymatic sequencing of DNA . The random 
cleavage of DNA can also be used for producing randomly 
25 fragmented DNA of defined size ranges for different 
purposes, for example for efficient hybridization of 
DNA, for preparing genomic libraries or for removal of 
high-molecular weight viscous DNA. 

One advantage of using a DNA glycosylase according 
30 to the invention in such methods is that in contrast to 
nucleases, DNA glycosylases do not require divalent 
cations and this is advantageous when buffers containing 
divalent cations are not desirable. A further advantage 
is that the DNA glycosylase may be inactivated by 
35 heating the reaction mixture to 80°C for 15 minutes, thus 
eliminating or substantially reducing its activity. 

Uracil -DNA glycosylase has previously been shown to 
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be efficient in removing contaminating DNA prior to PCR 
amplification. This method has the disadvantage that 
only DNA containing uracil could be removed and meant 
that uracil -containing DNA had to be prepared using 
5 appropriate uracil- containing primers to obtain DNA 
which could be removed prior to amplification. One 
advantage of the DNA glycosylases according to the 
present invention is that they do not have this 
requirement as any contaminating DNA would be likely to 
10 contain cytosine or thymine bases. Thus, CDG and/or TDG 
according to the invention may be added to a reaction 
mix and allowed to digest contaminating DNA. After 
treatment the enzymes/s are inactivated prior to the 
addition of the DNA sample and amplification to avoid 
15 degradation of the template or product. 

Thus a further aspect of the invention provides the 
use of one of more DNA glycosylases according to the 
present invention for removing contaminating DNA prior 
to PCR amplification. The use of one or more DNA 
20 glycosylases according to invention in DNA modification 
provides a further aspect of the invention. The term 
"modification" as used herein refers to all forms of 
modifying or manipulating DNA, including cleavage, base 
substitution or insertion etc. 
25 A DNA glycosylase according to the present invention 

may also be used in a method for the killing of cells. 
A DNA glycosylase according to the present invention may 
be introduced into specific target cells by means of 
known transformation techniques, liposomes, specific 
30 targeting systems such as ligands that bind to specific 
receptors, or any other suitable techniques. The DNA 
glycosylase may be expressed in a tissue- specif ic manner 
by placing a tissue-specific promoter upstream of the 
DNA sequence encoding a DNA glycosylase according to the 
35 present invention. Examples of such tissue- specif ic 
promoters are well known and are for example found in 
genes for a number of liver specific proteins such as 
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albumin, blood clotting factors and apolipoproteins ; 
several hormones, such as human growth hormone from the 
pituitary gland and insulin from Langerhans islands in 
pancreas, as well as aromatase involved in the estrogen 
5 biosynthetic pathway; porphobilinogen deaminase which is 
the third enzyme in the heme biosynthetic pathway; 
glycoprotein Ilb/IIIa which is expressed in maturing 
megakaryocytes; the Zeta subunit of T-cell antigen 
receptor (TCR) which is expressed in T- cells; CD14 

10 expressed in monocytes and macrophages; villin expressed 
in certain epithelial tissues and tyrosinase expressed 
in melanocytes and melanomas . In some cases abnormal 
expression from tissue specific promoters has been 
observed in tumour cells, and this may be exploited by 

15 using constructs of novel DNA glycosylases and the 
relevant tissue specific promoter. 

When the DNA glycosylase is expressed it may 
fragment the DNA in the cell and therefore kill the 
cell . Specific cells may also be targeted through the 

20 use of promoters containing other control elements, for 
example, promoters which are controlled in a cell -cycle 
or temporal manner or those possessing regulatory 
elements responsive to internal or external factors, 
e.g. promoters activatable by specific inducers, e.g. 

25 the inducer IPTG, which induces the lac promoter or lac 
derivatives such as trc, by certain metals (e.g. 
metallothionein promoter) , by certain hormones such as 
dexamethasone, androgens (on for example the promoter of 
the gene for prostate specific antigen which is tissue 

30 specific), retinic acid and certain cytokines. 

Conceivably, where enzymes of the invention exhibit 
specific substrate requirements in the sequence 
surrounding the base for excision, this specificity may 
be employed by appropriate low level expression of the 

35 DNA glycosylase such that only DNA with the specific 
sequence is made susceptible to degradation. 

Thus a further aspect of the invention provides a 
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method of killing cells, comprising the steps of 
introducing a DNA glycosylase according to the present 
invention into a cell and expressing said DNA 
glycosylase in the cell to an extent which results in 
5 the killing of that cell. Preferably, the DNA 

glycosylase according to the present invention is 
contained within an expression vector, most preferably, 
a tissue-specific expression vector. 

A further use of DNA glycosylases of the invention 

10 is for performing enzymatic DNA sequencing. This may be 
performed in a manner analogous to the chemical 
sequencing method of Maxam and Gilbert (Maxam and 
Gilbert (1980) Methods in Enzymology, 65: 499). 
However, the Maxam-Gilbert procedure involves the use of 

15 several very toxic chemicals, such as dimethylsulf ate 
(DMA) and hydrazine (the latter is also explosive) and 
use of the glycosylases of the invention present a 
considerable advantage. Enzymatic sequencing may be 
performed for example by end-labelling the sample DNA 

20 fragment appropriately, for example with 32 P, 33 P or 3S S. 
For identifying the positions of cytosines and thymines 
in the DNA, the DNA is treated with limiting amounts of 
cytosine-DNA glycosylase and thymine-DNA glycosylase 
according to the invention, respectively. The resulting 

25 AP- sites are then cleaved, e.g. by alkaline solution 

(pyridine) or by an AP-endonuclease . The resulting end- 
labelled fragments are subsequently separated e.g. by 
electrophoresis and the position of fragments of varying 
length identified appropriately, e.g. by 

30 autoradiography. Ideally, the positions of adenines and 
guanines should be determined in the same way using 
adenine- or guanine-DNA glycosylases. At the present 
time such enzymes are not available. However, the EL. 
coli DNA repair enzymes Tag and AlkA recognize adenine 

35 alkylated in the 3 -position (Tag, AlkA) and guanine 

alkylated in the 3 -position (AlkA) . Thus, one way of 
determining the positions of adenines and guanines may 
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be after alkylation of DNA with DMS , followed by 
treatment with AlkA and Tag. Subsequent experimental 
procedure may be performed as for determining the C and 
T positions. 

5 Thus, a further aspect of the invention provides a 

method of performing enzymatic DNA sequencing to 
determine the position of cytosine and/or thymine bases 
by treating said DNA with at least one CDG and/or TDG of 
the invention. 
10 The invention will now be described more 

specifically in the following non- limiting Examples with 
reference to the following drawings in which: 

FIGURE 1 comprises graphs showing in vitro excision of 
15 radiolabelled material from double stranded (ds) or 

single stranded (ss) [ 3 H] cytosine-labelled DNA substrate 
(C-substrate) and [ 3 H] thymine -labelled DNA substrate (T- 
substrate) by human UDG-mutants (CDG : Panel A, TDG : 
Panel B) . The data represent mean values from two 
20 independent experiments each in duplicate for each time 
point. Symbols in panel B of Figure 1 are as indicated 
in panel A; 

FIGURE 2 comprises graphs showing analysis of the 
25 radioactive excision products of substrate DNA by UDG 
(panel A) and UDG mutants TDG (Panel B) and CDG (Panel 
C) , performed by thin layer chromatography. U-substrate 
is indicated by stars (*) , other symbols are as in 
Figure 1. The migration of unlabelled standards (the 
30 free bases uracil, cytosine or thymine) is indicated as 
rectangles, marked respectively U-marker, C-marker and 
T-marker over the relevant fraction numbers; 

FIGURE 3 shows a revised organisation of the human UNG 
35 gene. The restriction maps with EcoRl, Hlndlll, Sad 

and Xbal are indicated. Exons are shown as black boxes 
and are numbered by Roman numbers. Exon 1A is a 
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previously unrecognised exon. Interspersed repeats are 
indicated (-: Alu, •: MER, ♦: MIR, *: position of a 300 
bp TA dinucleotide repeat) / 

5 FIGURE 4 shows the generation of human UNGl and UNG2 by 
transcription from two promoters and alternative 
splicing. P2 is the previously recognised promoter for 
transcription of UNGl (Haug et al . , 1994, FEBS Letters, 
353, pi 80- 18 4) and PI the promoter from which UNG2 is 

10 transcribed. Exon 1A encodes 44 amino acids present in 
UNG2, but absent in UNGl. The 35 N- terminal codons of 
exon IB are only present in UNGl . The presequence of 
UNG2 is shown on top with the putative nuclear 
localization signal underlined. The presequence of UNGl 

15 directing mitochondrial import is shown in the bottom 
line; 

FIGURE 5 shows the structure of the 5 '-terminal part of 
the human UNG gene (SEQUENCE I.D. No. 7). Bold letters 
20 indicate exons (1A and IB) ; 

FIGURE 6 shows the alignment of UNG proteins from man 
and mouse (SEQUENCE ID Nos 8 (hUNGl) , 9 (mUNGl) , 2 
<hUNG2) and 10 (mUNG2) ) . Note that UNGl and UNG2 

25 proteins have been aligned separately down to the common 
splice corresponding to codon 44 in human UNG2 . The 
presequence not present in the catalytically active form 
of human placental uracil -DNA glycosylase originally 
isolated, residues 1-77 in human UNGl (Wittwer et al . , 

30 1989, Biochemistry, 28, p780-784) is shown in bold 

letters. Downstream of the alternative splice site (i) 
used for generating UNG 2 forms (from 45 in human UNG2) , 
the sequences of the two forms are identical in each 
species. Residues that make up walls of the uracil - 

35 binding pocket or which are directly involved in 

catalysis are marked with a star (*) . Residues that are 
involved in DNA-binding (except those involved in 
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uracil -binding) are marked with a triangle (▼) ; and 

FIGURE 7 shows the subcellular localization in HeLa 
cells of UNG2-EGPP-N1 and UNG1-EGFP-N1 fusion products. 
5 HeLa cells were transfected with constructs expressing 
pUNG2 - EGFB - Nl (C) , pUNGl-EGFP-ttl (D) or the control 
pEGFP-Nl (A) , all expressed from the CMV promoter, and 
processed for confocal microscopy. Panel B shows 
staining of mitochondria with Texas red. 

10 

Example 1 

fiitfi dirppfgd imitaqpnep ifl of human UDft codons 
Site directed mutagenesis was performed on the relevant 
codons in human UDG and the proteins expressed in 
15 KRnherir.hia coli . 

Methods 

Site-directed mutagenesis was carried out as in Mol et 
al., 1995, supra. To obtain the Tyrl47Ala mutant, codon 

20 147, TAT-+Tyr, was changed to GCT-+Ala, and to obtain the 
Asn204Asp mutant, codon 204, AAC-*Asn, was changed to 
GAC-*Asp. Mutated DNA fragments were subcloned into 
human UDG expression construct pTUNGA84 by replacing 
restriction fragments in the expression construct by 

25 fragments containing the respective mutations. In 

R R rhf>rir>iia roli pTUNGA84 expresses high levels of a 
fully active human UDG (UNGA84) lacking 7 non-essential 
and non- conserved NH 2 - terminal residues of the mature 
form of UDG (Mol et al . , 1995, supra; Slupphaug et al . , 

30 1995, supra) . Expression of mutant proteins in 

RRrhpHrhia noli and purification of the mutant proteins 
to apparent homogeneity were carried out as described 
previously (Mol et al . , 1995, supra; Slupphaug et al . , 
1995, supra). Relevant fractions were assayed for DNA 

35 glycosylase activity during each step in the 

purification. As a result of high expression, 
purification may also take advantage of the UV 
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absorption of the enzymes. Peaks of UV absorption 
corresponding to the enzyme of interest could already be 
observed after only the first two column steps. 

5 To test enzymatic substrate specificities, 250 ng 

purified human "wild type" UDG (UNGA84) , UNGA84Tyrl4 7Ala 
or UNGA84Asn2 04Asp, were mixed with 200 ng ds- or ss- 
t 3 H] cytosine- labelled DNA (150mCi/mmol) , or ds- or ss- 
[ 3 H] thymine- labelled DNA (lOOmCi/mmol ) in 10 mM NaCl , 20 

10 mM Tris-HCl (pH 7.5), 1 mM EDTA, 1 mM dithiothreitol and 
0.5 mg/ml bovine serum albumin (final concentrations) in 
20 separate reactions. The final concentrations of 
the [ 3 H] cytosine-DNA (C) and [ 3 H] thymine -DNA (T) 
substrates were 6.5 a*M and 10 a*M, respectively. Release 

15 of radioactivity as a function of time was measured at 
37 D C. These conditions are later referred to as 
standard conditions. Substrate synthesis and processing 
of samples for scintillation counting were as described 
in Krokan and Wittwer (1981) Nucl . Acids Res., 9: 2599- 

20 2613. Single -stranded substrate was generated by 

boiling double -stranded substrate for 10 min, followed 
by rapid cooling on ice. 

Results 

2 5 Figure 1 demonstrates time -dependent release of acid- 
soluble radioactivity by homogeneous UNGA84Asn204Asp 
(CDG) from [ 3 H] cytosine- labelled DNA, but not from 
t 3 H] thymine-labelled DNA. Conversely, the homogeneous 
UNGA84Tyrl47Ala (TDG) mutant releases acid- soluble 

30 radiolabelled material from [ 3 H] thymine-labelled DNA, but 
not from [ 3 H] cytosine- labelled DNA. 

Analysis of radioactive excision products bv thin 

35 layer chroma nography 
Methods 

The analysis was performed using DC-cellulose as the 
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stationary phase and methanol : HC1 : H 2 0 - 70:20:10 as the 
mobile phase. Samples were prepared as follows: l . 5 M9 
enzyme (UNGA84, UNGA84Tyrl47Ala or UNGA84Asn204Asp as 
prepared in Example 1) was incubated with 1 m9 
5 [ 3 H] uracil - labelled DNA (SOOmCi/mmol) , [ 3 H] cytosine- 

labelled DNA (lSOmCi/mmol) or [ 3 H] thymine -labelled DNA 
(lOOmCi/mmol) in separate 50 ^1 reactions under standard 
buffer conditions (see Example l) for l hour. 
Macromolecules in the samples were then ethanol 

10 precipitated, the supernatants after centrif ugation were 
collected, ethanol was removed by evaporation and the 
remaining material was dissolved in 10 ^1 H 2 0. 1^1 was 
spotted on the membrane. After migration the cellulose 
sheet was cut in strips and radioactivity measured by 

15 scintillation counting in Ready Protein scintillation 
cocktail . 

Results 

Separation of the acid- soluble radioactive material by 
20 thin layer chromatography (Figure 2) demonstrated that 

the released material was the free bases [ 3 H] cytosine or 
[ 3 H] thymine. Separation using another mobile phase 
(butanol:H 2 0, 86:14) verified these results (data not 
shown) . In addition, both mutants release [ 3 H] uracil, 
25 whereas "wild type" UDG (UNGA84) releases [ 3 H] uracil 
only (Figure 2) . 

EauOBlfi-1 

fiH ^flf.ratq gp^r if i r i t-y an r l uracil inhibition and kinetic 
30 pT-npgrt jpg nf una- mutants 

Methods 

For measuring release of uracil from double- stranded (li- 
ds) or single- stranded (U-ss) DNA the various mutant 
enzymes (prepared as described in Example l and by 
35 analogous site-directed mutagenesis methods and 

identical expression and purification methods) were 
incubated with 200 ng ds- or ss- [ 3 H] dUMP-labelled DNA 
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(500mCi/tranol , final concentration) in 20 ^1 separate 

reactions for 10 min under standard conditions as 
described in Example 1 . For measuring release of 
[ 3 H]cytosine or [ 3 H] thymine, assays were performed as 
5 described in Example 1 using an incubation time of 10 
min. Uracil inhibition was analysed by adding 5 mM 
uracil (final concentration) to a standard U-ds assay. 
0 activity indicates activity below detection limit (10 
pmol per mg protein per min) with 100 ng enzyme and 200 
10 ng DNA substrate at standard conditions. 

The kinetic parameters were determined using six 
different substrate concentrations to obtain the K™ and 
v max values. Duplicate samples were incubated for 20 min 
15 using standard assay buffer conditions and substrates as 
specified. Km and V,^ were calculated using the 
computer program Enzpack, version 3.0 after the method 
of Wilkinson (1961) Biochem. J. # 80: 324-332. K cat was 
calculated from V,^ assuming an M r = 25000. 

20 

Results 

The results are shown in Tables 1 and 2 . From Table 1 
it can be seen that only the substitution Tyrl47Ala 
results in an enzyme which specifically excises thymine . 

25 Similarly, only the substitution Asn204Asp results in a 
mutant which excises cytosine. Both mutant enzymes 
exhibit activity on single or double -stranded DNA and 
are also able to excise uracil. From Table 2 it can be 
seen that the turnover numbers of CDG and TDG are lower 

30 than for "wild type" release of uracil. 

Discussion 

These results demonstrate the significance of Asn204 for 
specific binding of uracil -containing DNA and the 
35 significance of Tyrl47 side chain ring structure for 
preventing binding of thymine . 
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It is somewhat surprising that the novel CDG of the 
invention still recognizes uracil, considering the 
unfavourable proximity of the Asp carboxyl side chain 
and the 04 atom in uracil. However, it should be noted 
5 that the other oxygen atom of the Asp204 carboxyl side 
chain still may form H-bonds with N3 uracil and that 
Aspl45 main chain carbonyl as well as the amide-N of 
Aspl45 and Glnl44 also contribute to the specificity. 
In addition, the UDG activity remaining is very low 
10 (0.04-0.16%) compared with "wild type". CDG has a 10- 
fold increased preference for single stranded substrate, 
whereas TDG has a decreased preference (Figure 1 and 
Table 1) . 

15 It is evident that the turnover numbers (K cat ) of the 

novel enzymes releasing either cytosine or thymine, as 
well as residual UDG activities, are very low when 
compared with release of uracil by "wild type" UDG 
(Table 2) . However, the very high turnover number of 

20 UDG appears to be unique among DNA glycosylases and 
turnover numbers of other DNA glycosylases may be as 
low, or even lower than those of the engineered 
glycosylases CDG and TDG. Thus, a recent biochemical 
characterisation of recombinant N-methylpurine-DNA 

25 glycosylase from mouse gave Kc at values of 0.8 min -1 and 
0.2 min" 1 for excision of 3 -methyl adenine and 7- 
methylguanine respectively (Roy et al . (1994) 
Biochemistry, 33: 15131-15140). 

30 The ffnnftprl^* ^ Qli inducible 3-methyladenine-DNA 
glycosylase II (AlkA) is a DNA glycosylase that 
recognizes at least 6 different damaged bases, among 
these structurally different alkylated purines and 
pyrimidines. The turnover number for AlkA on the 

35 substrate 3-methyladenine-DNA is calculated to be 0,03 
min' 1 (Bjelland et al. (1994) J. Biol. Chem. , 269: 30489 
30495) . 
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The FPG protein (f ormamido-pyrimidine-DNA glycosylase) 
also has a rather low turnover number. The K cat value on 
the imidazole ring-opened form of 7-methylguanine-DNA 
substrate is calculated to 1.4 min* 1 (Boiteux et al . 
5 (1990) J. Biol. Chem., 265: 3916-3922). A low rate of 

catalysis is also likely for the naturally occurring 
T(U) /G-mismatch-DNA glycosylase since band shifts can be 
demonstrated after mixing the enzyme with substrate 
(Sassanfar & Roberts (1990) J. Mol . Biol., 212: 79-96). 

10 

All of these DNA glycosylases recognize at least two 
different substrates, and in most cases several damaged 
pyrimidines or purines. Probably the very high turnover 
number of UDG reflects a high selectivity of substrate 
15 binding in a tight fitting active site, allowing rapid 
catalysis by this specialized enzyme. In contrast, the 
DNA glycosylases with a broader substrate specificity 
may bind substrate less accurately, and excise the base, 
more slowly. 

20 

TABLE 1 



pmol excised per min per mg protein Inhibition 



5mM Uracil 



Mutant 


U-ds 


U-ss 


C-ds 


C-ss 


T-ds 


T-ss 


% 


•Wild type» 


4.7 xlO 7 


9.5 xlO 7 


0 


0 


0 


0 


80 


Gtnl44Leu 


3.4 xlO 4 


4.8 xlO 4 


0 


0 


0 


0 


25 


A*pl4501u 


5.5 xlO 4 


8.5 xlO 4 


0 


0 


0 


0 


80 


Asp 145 As n 


1.4xl0 4 


1.1 xlO 4 


0 


0 


0 


0 


80 


Tyrl47Ala 


2.2 xlO 4 


2.2 xlO 4 


0 


0 


1.3 xlO 3 


7.5 xlO 2 


0 


Tyrl47Phe 


3.2 xlO 7 


6.3 xlO 7 


0 


0 


0 


0 


50 


Serl69AIa 


3.1 xlO 6 


5.6 xlO 6 


0 


0 


0 


0 


80 


Asn204Asp 


1.7 xlO 4 


1.6 xlO 5 


3.0 xlO 2 


3.0 xlO 3 


0 


0 


0 


Asn204Oin 


1.5 xlO 6 


2.2 xlO 6 


0 


0 


0 


0 


70 


His268Leu 


1.3x10* 


2.6 x10 s 


0 


0 


0 


0 


75 



WO 97/25416 PCT/GB9 7/0 0057 

- 32 - 

TABLg 3 



Substrate 



C-ds C-ss T-ds T-ss U-ds U-ss 

(uMXmitT 1 ) GiMXimiT 1 ) OiM) (min" 1 ) OiM)(min l ) (uMMnuiT 1 ) (uMXmin 1 ) 

0.10 2500 0.06 5150 
6.0 0.06 1.4 0.02 3.5 1.0 0.30 0.6 
0.16 1225 0.10 2370 
35 0.12 5.3 0.39 - - - 2.4 1.2 2.0 15 
0.40 66 0.23 89 



15 Rynmple 4 

fif ff > r i-« of t tv; and cna Artivj t-v on frequency of 

r -j f a mp^in r f>«ngr>«nti mutations in K. col i iinqr* stra i n 

fN RftO**" and TC rnli ling " strain (NR8Q52) 
Mptihods 

20 An overnight culture of B, coli strains NR8051 and 
NR8052 (both recA + , provided by Tomas Kunkel of 
National Institute of Environmental Health, USA) 
containing plasmids pTrc99A, pTUNGA84, UNGA84Tyrl47Ala 
and UNGA84Asn204Asp were prepared as described in 

25 Example l and grown in LB-medium with ampicillin (100 
Mg/ml) at 30*C. The culture was then diluted 1:20 in 
fresh medium and cultured for 5 hours at 37*C. Induced 
culture contained 1 mM IPTG in the LB-medium. To 
determine the number of rifampicin resistant bacteria, 

30 100 m1 of the culture were mixed with 3 ml top agarose 
and poured on LB plates containing 100 vg/ml rifampicin 
and incubated overnight at 37*C. Colonies were counted 
and the number of rifampicin resistant colonies per 10* 
viable cells was calculated. 



Muunt 
«Wild type» 
Tyrl47AU 
10 Tyrl47Phe 
Asn204Asp 
Asn204CHn 



Results 

The results are shown in Table 3 . These results 



1 
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indicate that the expression of UDG does not cause an 
increase in mutation frequencies (plasmid pTUNGA84 
compared to parental pTrc99A) . In fact, human UDG 
complements R coli ung' cells. This is clear from the 
5 reduction in mutation frequencies from 4.4 to 1.3 when 
UDG is present in induced cells. Uninduced cells are 
also protected as a result of promoter leakage. In 
contrast, the mutation frequencies of E . coli ung* 
cells are increased by a factor of 8.6 and 39 when 
10 carrying plasmids encoding CDG and TDG, respectively, 
compared to host cells carrying the parental plasmid 
pTrc99A. This increases to approximately 8.9 and 94.4 
respectively, in induced ung* cells. 

15 Discussion 

Single amino acid substitutions transform the highly 
uracil -selective uracil -DNA glycosylase into less 
selective DNA glycosylases that attack normal 
pyrimidines and confer a mutator phenotype upon the 
20 cell, presumably because excess numbers of 
apyrimidinic- sites are formed. 

It may seem surprising that propagation of plasmids 
expressing CDG or TDG activity is at all possible, 

25 since they might be expected to kill the host cells. 

We believe that the relatively low turnover numbers and 
the low expression in the absence of inducer (IPTG) is 
sufficient to reduce the number of depyrimidinations to 
a level that the DNA repair system can cope with. 

30 Nevertheless, DNA degradation is detectable even in the 
absence of inducer and is strongly increased when IPTG 
is added (data not shown) . The survival of Escherichia 
coli recA + host cells carrying uninduced CDG or TDG- 
plasmid is equal to that of the parental cell carrying 

35 plasmid pTrc99A (data not shown) although mutation 

frequencies are increased by a factor of 8.6 and 39 for 
CDG and TDG, respectively as mentioned above. 
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Induction of CDG or TDG by IPTG reduces survival of the 
pflnherirhia eoli host cells (in both NR8051 and NR8052) 
to less than 50% and 10%, respectively, within 5 hrs . 
Thus, AP-site repair capacity is sufficient for repair 
5 of damage caused by expression of CDG or TDG due to 

"leakage" from the uninduced promoter. However, this 
repair is apparently not complete, or may be 
inaccurate, since the frequency of mutations leading to 
rifampicin resistance is significantly increased by 

10 induction with IPTG (Table 3) . The activity of TDG in 
vivo leads to a 10 -fold higher mutation frequency in 
B prhPrichia noli than the in vivo CDG activity. This 
probably reflects the fact that TDG has a higher 
activity on dsDNA than CDG, as demonstrated by in vitro 

15 experiments with homogeneous enzyme (Figure 1) , and 

that the Km value for TDG on dsDNA is much lower than 
the for CDG on dsDNA (Table 2) . We have observed 
that TDG and CDG are both highly cytotoxic in a recA' 
background fWanhpriRhia coli DH5a) even without 

20 induction (data not shown) . It is likely that this 

cytotoxic effect is due to a lack of SOS - induction in 
recA* cells. The chemical nature of the SOS-inducing 
signal, or signals, is not fully known, and some DNA 
lesions may indirectly activate the SOS response by 

25 interfering with DNA replication (Sassanfar & Roberts, 
1990, supra) . If generation of AP-sites by TDG and CDG 
directly or indirectly triggers SOS -induction, this 
would increase cell survival, at the cost of error 
prone repair and a high yield of mutations . CDG and 

30 TDG should be very useful for exploring the biological 
consequences of AP-sites in DNA. 

The new DNA glycosylases that we have engineered are 
distinctly different from previously known 
35 glycosylases. The mismatch-specific thymine-DNA 

glycosylase previously reported also releases uracil 
(Sassanfar & Roberts, 1990, supra; Nedderman & Jiricny, 
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1993, supra) , like the thymine-DNA glycosylase we have 
constructed. However while the naturally occurring 
thymine-DNA glycosylase has an absolute requirement for 
a mismatched U or T opposite of a G, the TDG we have 
5 engineered recognises T or U from T{U) :A matches, as 
well as from single stranded substrate. A DNA 
glycosylase recognizing unmodified cytosine had 
previously not been reported. 

10 The mutator phenotype caused by a single amino acid 

substitution is intriguing since it changes an enzyme 
from its normal role in mutation avoidance into a 
cytotoxic mutator protein. In the case of CDG this 
change is the result of a single A-+G transition, which 

15 in vivo could be the result of several different 

events, such as deamination of A, 04-alkylation of T in 
the complementary strand, and replication errors. 
Since this mutation would be dominant, only one allele 
would need to be mutated to get a new phenotype. It is 

20 possible, however, that this mutation would be lethal, 
or that it would be without serious consequences due to 
efficient repair of DNA in mammalian cells. 
Nevertheless/ the generation of repair enzymes having a 
dominant mutator effect that would give the cells a 

25 hypermutable phenotype may represent a new principle in 
mutagenesis . 

TABLE 3 

30 Frequency of rif R mutations per 10 e cells 



NR8051 NR8052 

Plasmid Uninduced Induced Uninduced Induced 

pTRC99A 0.8±0.3 0.9±0.5 4.2 ± 1.2 4.4 ± 1.1 

35 pTUNGA64 0.7 ±0.4 0.8 ± 0.3 0.9 ± 0.2 1.3 ± 0.4 

pTUNGA84Tyrl47Ala 31 ± 8 85 ± 32 13 ± 5 57 ± 9 

pTUNGA84Asn204Asp 6.9 ± 4.1 8.0 ± 3.4 2.1 ± 0.7 4.1 ± 1.1 
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ffiyninplfi 5 

K ffont-s o f -mn arcivuv on the fremienfiv of r i famp i c i n 
rPHi Rt.ani- miihati r>nR in e. noli strains BW527 . and 

An overnight culture of R. noli strains BW527 (endoIV) 
or GW2100 (umuC) provided by Erling Seeberg, The 
National Hospital, Oslo, containing plasmids pTrc99A or 
UNGA84Tyrl47Ala (TDG) were prepared as described in 
Example 1 and grown in LB-medium with ampicillin (100 
/ig/ml) at 30*C. The culture was then diluted 1:20 in 
fresh medium and cultured as described in Example 4. 



BttBultB 

The results are shown in Table 4. These results 
15 indicate that the expression of UNGA84Tyrl47Ala (TDG) 
in * nnli strains BW527 (endoIV) or GW2100 (umuC') 
enhances the mutagenic effect of TDG compared to 
strains that do not carry these defects in the repair 
of AP- sites or defect in umuC especially after 
20 induction with IPTG. pTrc99A alone did not exert this 
effect to any significant extent. Even more 
importantly, the background mutations in these strains 
are low and the effects of induction with IPTG is high, 
thus improving the usefulness of UNGA84Tyrl47Ala (TDG) 
25 for mutagenesis when using more optimal strains. 

These results are especially surprising in light of 
previous findings that mutants in umuC are generally 
difficult to mutate by some methods, for example by UV- 
30 light or by chemical challenge. 
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TABLE 4 

Effects of TDG- activity on frequency of rifampicin 
resistant mutations in E. coli strains BW527 and GW2100 

5 

Frequency of rif R mutations per 10° cells 



BW527 GW2100 
Plaeraid Uninduced Induced Uninduced Induced 

10 pTRC99A 0.06±0.02 0.07±0.03 0 . 24xl0" 3 ±0 . 1x10° 6xl0°±5xl0 

pTUNG A 8 4 Ty r 1 4 7 Al a l-20±0.2 240±122 0.65±3829 84±28 

Example g 

Tsnlation and characterisation of a nunlear form of 
15 n-raril-DNA alycosylase 

Mafprials and Methods 

Materials 

20 Mouse embryonic carcinoma cDNA library, human liver 
cDNA library and NT2 neuronal precursor cell cDNA 
library were from Stratagene (La Jolla, CA, USA) . All 
libraries were propagated in the Uni-ZAP™XR vector 
using XL-1 blue as host. [a- 32 P)dCTP, [ 35 S] methionine, 

25 Rediprime random labelling kit and Hybond N+ filters 
were all from Amersham (UK) . All sequencing primers 
were from MedProbe (Oslo, Norway) . Dye terminator 
cycle sequencing ready reaction kit was from Applied 
Biosystems (Foster City, CA) . The Dynazyme PCR kit was 

30 purchased from Finnzymes Oy (Espoo, Finland) . TNT in 
vitro transcription/translation rabbit reticulocyte 
lysate system kit, pGEM-T TA cloning kit, Alter Sites 
II in vitro Mutagenesis System, primers for sequencing 
from T3 and T7 promoters and T3 RNA polymerase were 

35 from Promega (Madison, WI) . The plasmid encoding the 
red- shifted variant of green fluorescent protein 
(pEGFP-Nl) was from Clontech (Palo Alto, CA, USA) . 
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Restriction enzymes were from New England Biolabs Inc. 
(Beverly, MA, USA) . 

pn^nina of cDNA libraries 

All libraries were screened as recommended by the 
manufacturer, using 32 P-labelled UNG40 cDNA (Olsen et 
al., 1989, EMBO J., 8, p 3121-3125) as probe. 
Hybridization was carried out at 65°C overnight in 6 x 
SSC, 5 x Denhardt's solution and 0.1% SDS . Filters 
were washed in 0.1 x SSC/0.5 % SDS at 65°C and 
autoradiographed. Three rounds of screening were done. 
In vivo excision of pBluescript phagemids from the Uni- 
ZAP™XR vector was performed as recommended by the 
manufacturer. 

figgnftnc p analysiR of clones 

Sequencing was performed on an Applied Biosystems Model 
373 A DNA Sequencing System using the Dye terminator 
cycle sequencing ready reaction kit as recommended by 
the manufacturer. The sequences were analysed using 
the Auto Assembler software (Applied Biosystems) . 



7 n iHfra hranflrri r Hnn. uracil -DNA alVCQRVlaflfi aSSSYS 

a nH ^r^jpnh transf Pfit.ion of HpT.a cella for promoter 
nftuflies 

In vitro transcription/translation was performed with 
the TNT transcription/translation system with 
[ 35 S] methionine as recommended by the manufacturer, 
using 200 ng of the expression constructs per 10 pi 
reaction volume. The mouse UNG1 -pBluescript construct 
was transcribed from the T3 promoter in the pBluescript 
vector. The insert of mouse XJNG2 -pBluescript was 
amplified by the polymerase chain reaction using 
Dynazyme PGR kit, ligated into the pGEM-T vector and 
transcribed from the T7 promoter. The human UNG2- 
pBluescript was transcribed from the T3 promoter after 
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Sacl/Nhel excision of a 79 bp fragment from the 
polylinker and the 5'-end of cDNA for UNG2 . Human UNG1 
cDNA was transcribed from the T7 promoter as previously 
described (Slupphaug et al . , 1995, Biochemistry, 34, 
5 pl28-13 8) . The samples were run on a 12% denaturing 
sodium dodecyl sulfate polyacrylamide gel (SDS-PAGE) . 
The gel was dried, autoradiographed overnight and 
scanned on an LKB Ultroscan XL Enhanced Laser 
Densitometer. Uracil-DNA glycosylase activity was 

10 measured in parallel samples of the in vitro 

transcription/translation assay mixture containing 
unlabelled amino acids (Slupphaug et al . , 1995, supra). 
A construct containing both promoters (pGL2-ProAB) 
linked to the lucif erase gene was prepared by insertion 

15 of a PvxiII/AfluI fragment (the enzymes cleave in 

positions 418 and 1035, respectively) from the promoter 
region of the UNG gene into the Smal-Mlul sites of 
pGL2-ProB. A promoter II-lucif erase construct (pGL2- 
ProB) and transient transf ection with Transfectam 

20 (Promega) have been described previously (Haug et al . , 
1994, FEBS Letters, 353, p 180-184) . 

Preparati on of pUNG-EGFP-Nl fusion constructs and 

localization studies 

25 UNG15 cDNA, which encodes UNG1 , in pGEM7 Z f + (pUNG15) , 
(Slupphaug et al . , 1995, supra; Olsen et al . , 1989, 
supra) was digested with Bell, which cuts at bp 1019 in 
UNG 15 cDNA, blunted with DNA polymerase I, (Klenow 
fragment) , and ligated to an Agel linker prepared from 

30 the oligonucleotide 5 » -ACCGGTGCC-3 f and its 

complementary copy. The religated pUNGIS containing 
the Agel linker correctly ligated into the Bell site 
(verified by sequencing) was digested with RsrII, which 
cuts at bp 49 in UNG 15 cDNA (Olsen et al., 1989, 

35 supra) , blunted as above and finally digested with 
Agel . The fragment was then ligated into pEGFP-Kl 
digested with £mal (blunt) and Agel. The construct was 
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sequenced to verify that the construct was in frame 
with the ATG of the EGFP-N1 fusion protein. The TGA 
stop codon of pUNGIS was changed to GGA by site- 
directed mutagenesis performed according to the 
5 procedure provided by the manufacturer using ssDNA 

prepared with R408 phage. Potential pUNGIgqa- EGFP-Nl 
constructs were screened by digestion with Sell 
(digests only unmutated plasmids) and verified by 
sequencing. The correct construct was named pUNGi- 
10 EGFP-Nl. cDNA for UNG2 (this example) in pBluescript 
was digested with tfhel, which cuts 54 bp upstream of 
ATG, and EcdtJI which cleaves the cDNAs in the sequence 
that is shared by cDNAs for UNG1 and UNG2 (positions 
529 and 520, respectively) . The resulting fragment of 
15 interest (501 bp) was isolated and ligated to the 5155 
bp fragment of Nhel/BcoKl -digested pUNGl-EGFP-Nl to 
obtain pUNG2 -EGFP-Nl . Transient transf ections of HeLa 
cells were done with the CaP0 4 -method (Protection, 
Promega) according to the manufacturer's 
20 recommendations. Confocal microscopy (BioRad MRC-600) 
of HeLa cells and staining of mitochondria with mouse 
anti human mitochondria antibody (MAB 1273 r Chemicon) 
and Texas Red anti -mouse IgG (Vector) were performed as 
previously described (Nagelhus et al . , 1995, Exptl . 
25 Cell Res., 220, p 292-297). Examination of HeLa cells 
transf ected with expression plasmids pEGFP-Nl, pUNGl- 
EGFP-N1 or pUNG2 -EGFP-Nl was carried out using an 
excitation wave length of 488 nm and emission wave 
length >515 nm at 16 hours after transfection 

30 

Results 

A human NT2 neuronal precursor cell cDNA library and a 
mouse embryonic carcinoma cDNA library were screened 
and a new form of human uracil -DNA glycosylase (human 
35 UNG2) encoded by the UNG gene, as well as the 

homologous cDNA from mouse (mouse UNG2) was identified. 
In addition the cDNA for the mouse homolog (encoding 
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mouse UNG1) of human UNG1 (Olsen et al . , 1989, supra) 
was identified. cDNA for human UNG2 has an ORF 
encoding 44 N- terminal amino acids not found in human 
UNG1 whereas cDNA for human UNG1 has an ORF encoding 3 5 
5 amino acids not found in human UNG2 (Figure 4) . The 
two forms are identical in the rest of the amino acid 
presequence, which is not required for enzyme activity, 
as well as in the catalytic domain, altogether 2 69 
identical consecutive amino acids. The sequence of the 

10 269 amino acids common to UNG1 and UNG2 , and the 

corresponding DNA sequence, is identical to amino acid 
residues 35-304 in Olsen et al . , 1989, supra. cDNAs 
for human UNG2 and its mouse homolog, are apparently as 
abundant as UNG1 in cDNA libraries from proliferating 

15 cells since among 20 cDNA clones that were sequenced 10 
were of the UNG2 type and 10 were similar to the 
previously known UNG1 type. Among 4 mouse cDNAs 
sequences, 3 were of the UNG2 type and 1 was of the 
UNGl type. However, screeing of a human hepatocyte 

20 library with UNG40 cDNA resulted in the isolation of 8 0 
strongly bybridizing clones and sequencing of 14 of 
these demonstrated that they were all similar to the 
previously characterized cDNA for UNGl or the cDNA 
UNG40 (Olsen et al . , 1989, supra). 

25 

Comparison of the human cDNA for UNG2 with the recently 
published complete human UNG sequence (Haug et al . , 
1996, Genomics, 36, p4 08-416) revealed the presence of 
a previously unrecognised exon (exon 1A) located some 

30 650 base pairs upstream of the previously identified 
exon 1 (hereinafter called exon IB) . A revised 
organization of the UNG gene is therefore presented in 
Figure 3 . Exon IB forms the leader sequence and codon 
1-104 of the mRNA enbcoding the previously known form 

35 UNGl. The mRNA corresponding to the new human cDNA is 
formed by joining exon 1A (encoding 44 amino acids) 
into a consensus splice site after codon 35 in exon IB 



« 



WO 97/2541 6 PCT/GB97/00057 

- 42 - 

after which the two human cDNAs are identical. The 
open reading frame of human UNG2 cDNA predicts a 
protein of 313 amino acids, as compared to 304 amino 
acids for UNG1 . Genomic clones for the mouse homolog 
5 of the UNG gene have also been isolated and sequenced. 
This has revealed that the splice sites for exons 3, 4, 
5 and 6 in the UNG genes from mouse and man are in 
identical positions. Furthermore, PCR analyses have 
demonstated that the rest of the mouse gene is 
10 structurally similar to the human gene, as expected 
from the cDNA clones (data not shown) . 

Figure 4 shows how the alternative forms of mRNA for 
UNG1 and UNG 2 arise as deduced from human cDNAs and the 

15 corresponding UNG sequences and indicates the presence 
of a putative nuclear localization signal of 4 basic 
residues (RKRH) in the N- terminal end of the new cDNA 
and putative mitochondrial localization signals in cDNA 
for UNG1. In addition, and now shown here, both human 

20 cDNAs contain a putative nuclear localization signal 
(RKRHH) in the catalytic domain (residues 258-262 in 
the ORF of cDNA for UNG1) . These residues are located 
at the surface of the enzyme between a-helix 7 and fi- 
st rand 4 (Mol et al., 1995, Cell, 80, p869-878) . 

25 

Figure 5 shows the genomic structure of exons 1A and 
IB, as well as the structure of the previously 
characterized promoter (hereinafter called promoter 
II), possible elements in the putative promoter 

30 upstream of exon 1A (hereinafter called promoter I) and 
the alternative splice acceptor site (SEQUENCE I.D. No. 
7) . Promoter I probably starts after the 3* -terminal 
end of two Aiu-repeats (position 425) and ends 
immediately upstream of the start of exon 1A. However, 

35 it can not be excluded that the promoter is located 
upstream of the Alu-repeats. This would require the 
presence of an exon encoding a leader that would be 
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joined to exon 1A. This is considered unlikely since 
promoter motifs upstream of the Alu-repeats have not 
been detected and furthermore transcripts of the 
required size have also not been detected by Northern 
5 analyses (data not shown) . Furthermore, the cDNA for 
UNG2 does not contain sequences from this upstream 
region . 

Figure 6 shows an alignment of predicted amino acid 

10 presequences of the human and mouse enzymes (SEQUENCE 

I.D. Nos 2 and 8-10) . Note that UNG1 proteins and UNG2 
proteins have been aligned separately in the parts of 
the proteins that are derived from different exons (up 
to codon 45 in human UNG2) . Table 4 shows the % of 

15 identical residues in the different forms, using human 
UNG2 as the reference (100%) . The parts of the protein 
that are not required for catalytic activity are less 
well conserved than the catalytic domain. Amino acids 
that have been found to be critical for catalytic 

20 activity or formation of the uracil -binding pocket (Mol 
et al., 1995, supra; Kavli et al . , 1996, EMBO J., 15, 
p3442-3447) or DNA binding are completely conserved in 
mouse (residues Q144, D145, P146, Y147, F158, S169, 
N204, S247, H268, S270, L272, S273, Y275 and R276 in 

25 UNG1) . 

To compare the promoter activity of promoter I alone 
and promoter I and promoter II in combination, 
promoter- lucif erase gene constructs were prepared and 

30 transient transfection experiments performed with HeLa 
cells. These studies verified the promoter activity of 
promoter II alone (Haug et al . , 1994, supra) and 
further demonstrated that when both promoters are 
present in the construct, the lucif erase activity 

35 increased some 50%, indicating that promoter I is also 
active in HeLa cells, as expected from the abundance of 
the new cDNA in proliferating cells (Table 5) . 
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Coupled transcription- translation of the two forms of 
human and mouse cDNA resulted in easily measurable 
uracil -DNA glycosylase activity for both forms from 
mouse and man. For calculations of the relative 
5 specific activities, the radioactivity released in 
uracil -DNA glycosylase assays was compared to band 
intensities on an SDS-PAGE gel from 
transcription/translation reactions using [ 35 S] 
methionine (Table 6) . 

10 

To examine whether human UNG1 and UNG2 were 
translocated to different subcellular compartments, 
constructs expressing fusion proteins of the UNG 
proteins and a red shifted variant of green fluorescent 

15 protein (EGFP-N1) were prepared. These were used for 
transient transfection experiments with HeLa cells. 
The major advantage of the green fluorescent protein 
(over the use of antibodies) is that this method relies 
on the autof luorescence of this protein alone, and thus 

20 possible cross reaction of the antibody with epitopes 
in irrelevant proteins is not a problem. The control 
(pEGFP-Nl) shows that the green fluorescent protein 
displays a homogeneous staining over the cells (Figure 
7A) . In contrast, the UNG2 -EGFP-N1 fusion protein is 

25 exclusively located in the nuclei (Figure 7C) and the 
UNG1-EGFP-N1 fusion protein (Figure 7D) is mainly, if 
not exclusively, located in extranuclear spots that 
have the same appearance as mitochondria stained with 
Texas red (Figure 7B) . These results provide 

30 convincing experimental evidence that UNG2 is a nuclear 
protein and UNG1 a mitochondrial protein. 
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Table 4 . Conservation of amino acids in four homologs 
of uracil -DNA glycosylase calculated as % identity with 
human UNG2* 



5 







% identity of 


domains 






Variant 


Common 


Catalytic 


Overall 




preeequencetf 


presequence 


domain 


identity 




(1-44) 


(45-63) 


(64-313) 


(1-313) 


hUNGl 


2 


100 


100 


90 


mUNG2 


64 


75 


91 


86 


mUNGl 


2 


75 


91 


79 



15 

* The identity is calculated for the domains in UNG2 
compared with the corresponding domains in the other 
forms. # The identity of the presequences of hUNGl and 
mUNGl is 27% with 82% identity overall. 

20 

Table 5 . Promoter activites in the UNG gene* 



Promoter - reporter 


Luciferase activity 


gene construct 


% 


pGL2 -Basic 


0 .8±0 .4 


pGL2 - ProB 


100±8 


pGL2-ProAB 


156±4 



30 

* The promoter activity of pGL2-ProB 

(promoter II) was arbitrarily set to 100%. 
pGL2 -Basic is a control lacking promoter. 
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Table 6. Relative specific activities of different 
forms of UNG after translation in rabbit reticulocyte 
lysates* 



Protein 


dpm* 


Area (mm 2 ) 


Activity 
(dpm/ area) 


human UNG1 


1291 


0 .054 


23907 


human UNG2 


6360 


0.268 


23731 


human UNG1 


921 


0 .061 


15098 


mouse UNG 2 


856 


0.051 


16784 



* Relative specific activites were calculated from 
15 measured dpm-values ( 3 H-uracil released in uracil -DNA 
glycosylase assays) and areas under the curve of 
scanned bands on SDS-PAGE gels after subtraction of 
background values of 123 dpm. 
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Claims; 

1.1. A DNA glycosylase capable of releasing cytosine 
bases from single stranded (ss) DNA and/or double 
5 stranded (ds) DNA (cytosine-DNA glycosylase) or thymine 
bases from both single stranded (ss) DNA and double 
stranded (ds) DNA or from single stranded (ss) DNA 
(thymine-DNA glycosylase) or uracil bases from single 
stranded (ss) DNA and/or double stranded (ds) DNA 
10 (uracil-DNA glycosylase), wherein said uracil-DNA 
glycosylase is encoded by a nucleic acid molecule 
comprising the sequence (SEQUENCE I.D. Nos 1 and 2): 



15 



30 



45 



1 CACAGCCACA GCCAGGGCTA GCCTCGCCGG TTCCCGGGTG GCGCGCGTTC GCTGCCTCCT 

61 CAGCTCCAGG ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG 
MIG QKT LYSF FSP SPA 



121 GAAGCGACAC GCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGGCGTGG CTGGGGTGCC 
20 RKRH APS PEP AVQG TGV AGV 

181 TGAGGAAAGC GGAGATGCGG CGGCCATCCC AGCCAAGAAG GCCCCGGCTG GGCAGGAGGA 
PEES GDA A A I PAKK APA GQE 

25 241 GCCTGGGACG CCGCCCTCCT CGCCGCTGAG TGCCGAGCAG TTGGACCGGA TCCAGAGGAA 

EPGT PPS SPL SAEQ LDR IQR 

3 01 CAAGGCCGCG GCCCTGCTCA GACTCGCGGC CCGCAACGTG CCCGTGGGCT TTGGAGAGAG 
NKAA ALL RLA ARNV PVG FGE 



361 CTGGAAGAAG CACCTCAGCG GGGAGTTCGG GAAACCGTAT TTTATCAAGC TAATGGGATT 
SWKK HLS GEF GKPY FIK LMG 



421 TGTTGCAGAA GAAAGAAAGC ATTACACTGT TTATCCACCC CCACACCAAG TCTTCACCTG 
35 FVAE ERK HYT VYPP PHQ VFT 

4 81 GACCCAGATG TGTGACATAA AAGATGTGAA GGTTGT CAT C CTGGGACAGG ATCCATATCA 
WTQM CDI KDV KVVI LGQ DPY 

40 541 TGGACCTAAT CAAGCTCACG GGCTCTGCTT TAGTGTTCAA AGGCCTGTTC CGCCTCCGCC 

HGPN QAH GLC FSVQ RPV PPP 



6 01 CAGTTTGGAG AACATTTATA AAGAGTTGTC TACAGACATA GAGGATTTTG TTCATCCTGG 

PSLE NIY KEL STDI EDF VHP 

661 CCATGGAGAT TTATCTGGGT GGGCCAAGCA AGGTGTTCTC CTTCTCAACG CTGTCCTCAC 

GHGD LSG WAK QGVL LLN A V L 
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721 GGTTCGTGCC CATCAAGCCA ACTCTCATAA GGAGCGAGGC TGGGAGCAGT TCACTGATGC 

TVRA H Q A NSH KERG WEQ FTD 

781 AGTTGTGTCC TGGCTAAATC AGAACTCGAA TGGCCTTGTT TTCTTGCTCT GGGG CTCTTA 
5 AVVS WLN QNS N G L V F L L WGS 

841 TGCTCAGAAG AAGGGCAGTG CCATTGATAG GAAGCGGCAC CATGTACTAC AGACGGCTCA 
YAQK KGS AID RKRH H V L QTA 

10 901 TCCCTCCCCT TTGTCAGTGT ATAGAGGGTT CTTTGGATGT AGACACTTTT CAAAGACCAA 

HPSP LSV YRG FFGC RHF SKT 

9 61 TGAGCTGCTG CAGAAGTCTG GCAAGAAGCC CATTGACTGG AAGGAGCTGT G AT CATC AG C 
NELL QKS GKK PIDW KEL 

15 

1021 TGAGGGGTGG CCTTTGAGAA GCTGCTGTTA ACGTATTTGC CAGTTACGAA GTTCCACTGA 
1081 AAATTTTCCT ATTAATTCTT AAGTACTCTG CATAAGGGGG AAAAGCTTCC AGAAAGCAGC 
20 1141 CATGAACCAG GCTGTCCAGG AATGGCAGCT GTATCCAACC ACAAACAACA AAGGCT AC C C 

1201 TTTGACCAAA TGTCTTTCTC TGCAACATGG CTTCGGCCTA AAATATGCAG AAGACAGATG 
1261 AGGTCAAATA CTCAGTTGGC TCTCTTTATC TCCCTTGCCT TTATGGTGAA ACAGGGGAGA 

25 

1321 TGTGCACCTT TCAGGCACAG CCCTAGTTTG GCGCCTGCTG CTCCTTGGTT TTGC CTGGTT 

1381 AGACTTTCAG TGACAGATGT TGGGGTGTTT TTGCTTAGAA AGGTCCCCTT GTCTCAGCCT 

30 1441 TGCAGGGCAG GCATGCCAGT CTCTGCCAGT TCCACTGCCC CCTTG AT CTT TGAAGGAGTC 

1501 CTCAGGCCCC TCG C AG CAT A AGGATGTTTT GCAACTTTCC AGAATCTGGC C C AG AAATT A 

1561 GGGCTCAATT TCCTGATTGT AGTAGAGGTT AAGATTGCTG TGAGCTTTAT CAGATAAGAG 

35 1621 ACCGAGAGAA GTAAGCTGGG TCTTGTTATT CCTTGGGTGT TGGTGGAATA AGCAGTGGAA 

16 81 TTTGAACAAG GAAGAGGAGA AAAGGGAATT TTGT CTTTAT GGGGTGGGGT GATTTTCTCC 

4Q 1741 TAGGGTTATG TCCAGTTGGG GTTTTTAAGG C AG C AC AG AC TGCCAAGTAC TGTTTTTTTT 

18 01 AACCGACTGA AATCACTTTG GGATATTTTT TCCTGCAACA CTGGAAAGTT TTAGTTTTTT 

18 61 AAGAAGTACT CATGCAGATA TATATATATA TATTTTTCCC AGTCCTTTTT TTAAGAGACG 

45 „, 
1921 GTCTTTATTG GGTCTGCACC TCCATCCTTG ATCTTGTTAG CAATGCTGTT TTTGCTGTTA 

1981 GTCGGGTTAG AGTTGGCTCT ACGCGAGGTT TGTTAATAAA AGTTTGTTAA AAGTTCAAAA 
50 2041 AAAAAAAAAA AAA 



or a fragment thereof encoding a catalytically active 
product cotnprising at least nucleotides 121 to 130 in 
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addition to the catalytic domain, or a sequence which 
is degenerate, substantially homologous with or which 
hybridizes with at least nucleotides 121 to 130 of any 
such aforesaid sequence. 

5 

2. A cytosine-DNA glycosylase (CDG) as claimed in 
claim 1 . 

3. A cytosine-DNA glycosylase (CDG) as claimed in 

10 claim 1 or 2 wherein said CDG is capable of releasing 

both cytosine and uracil bases from ssDNA and/or dsDNA. 

4. A cytosine-DNA glycosylase (CDG) as claimed in any 
one of claims 1 to 3 wherein said CDG is derived from 

15 UDG . 

5 . A CDG as claimed in claim 4 wherein Asn at amino 
acid position 204 in human UDG protein or equivalent 
residue in other species is substituted or modified. 

20 

6 . A CDG as claimed in claim 5 wherein said Asn or 
equivalent residue is replaced with an aspartic acid 
residue (Asp) . 

25 7. A thymine-DNA glycosylase (TDG) as claimed in claim 
1. 

8. A thymine-DNA glycosylase (TDG) as claimed in claim 
1 or 7 wherein said TDG is capable of releasing both 

30 thymine and uracil bases from both ssDNA and dsDNA. 

9. A thymine-DNA glycosylase (TDG) as claimed in any 
one of claims 1, 7 or 8 wherein said TDG is capable of 
releasing thymine bases from A :T DNA pairs. 

35 

10. A thymine-DNA glycosylase (TDG) as claimed in any 
one of claims 1, 7 or 8 wherein said TDG is capable of 
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releasing thymine bases from single stranded DNA. 

11. A thymine-DNA glycosylase (TDG) as claimed in any 
one of claims 1 or 6 to 9 wherein said TDG is derived 

5 from UDG. 

12 . A TDG as claimed in claim 11 wherein Tyr at amino 
acid position 147 in human UDG protein or equivalent 
residue in other species is substituted or modified. 

10 

13 . A TDG as claimed in claim 12 wherein said Tyr or 
equivalent residue is replaced with an alanine residue 
(Ala) . 

15 14 . A CDG or TDG as claimed in any one of claims 4 to 
6 or 11 to 13 wherein said UDG is human. 

15. A uracil -DNA glycosylase (UDG) as claimed in claim 
1. 

20 

16. A uracil -DNA glycosylase (UDG) as claimed in claim 
1 or 15 wherein said fragment comprises at least 
nucleotides 71 to 202 and/or degenerate, substantially 
homologous and hybridizing sequences are degenerate, 

25 substantially homologous with or hybridize with at 
least nucleotides 71 to 202. 

17. A UDG as claimed in claim 16 wherein degenerate, 
substantially homologous and hybridizing sequences are 

30 degenerate, substantially homologous with or hybridize 
with the entire sequence. 

18. Nuclear localization peptides encoded by a nucleic 
acid molecule comprising the sequence (SEQUENCE I.D. 

35 Nos 3 and 4) : 



ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG 
MIG QKT LYSF FSP SPA 
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GAAGCGACAC GCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGGCGTGG CTGGGGTGCC 
RKRH APS PEP AVQG TGV A G V 

TGAGGAAAGC GGAGATGCGG CG 
5 PEES GDA A 

or a fragment thereof encoding a functional equivalent 
or a sequence which is degenerate, substantially 
homologous with or which hybridizes with any such 
10 aforesaid sequence. 

19. Nuclear localizing peptides as claimed in claim 18 
wherein said peptides include the amino acid sequence 
RKRH. 

15 

20. A DNA glycosylase as claimed in any one of claims 
1 to 17 which additionally comprises at least one 
nuclear localization peptide sequence as defined in 
claim 18 or 19 or at least one mitochondrial 

20 localization peptide sequence encoded by a nucleic acid 
molecule comprising the sequence (SEQUENCE I.D. Nos 5 
and €) : 

ATGGGCGTCT TCTGCCTTGG GCCGTGGGGG TTGGGCCGGA AGCTGCGGAC GCCTGGGAAG 
25 MGV FCL GPWG LGR KLR TPGK 

GGGCCGCTGC AGCTCTTGAG CCGCCTCTGC GGGGACCACT TGCAG 
GPL QLL SRLC GDH L Q 

30 or a fragment thereof encoding a functional equivalent 
or a sequence which is degenerate, substantially 
homologous with or which hybridizes with any such 
aforesaid sequence. 

35 21. An assay for the identification of DNA 

glycosylases as defined in any one of claims 1 to 14 in 
a sample, in which said assay comprises at least the 
step of assaying for activity in the sample which is 
capable of excising thymine or cytosine and optionally 

40 also uracil from an introduced ssDNA and/or dsDNA 
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substrate . 



10 



15 



30 



22. A nucleic acid molecule comprising a nucleotide 
sequence which encodes a DNA glycosylase and/or nuclear 
localizing peptide as defined in any one of claims 1 to 
20 . 

23 . An expression vector containing a nucleic acid 
molecule as defined in claim 22. 

24 . A transformed or transfected host cell carrying a 
nucleic acid molecule as defined in claim 22 . 

25 . Use of one or more DNA glycosylases as defined in 
any one of claims 1 to 17 in in vitro and/or i n v i vo 
mutagenesis systems . 



26. Use of one or more DNA glycosylases as defined in 
any one of claims 1 to 14 for removing contaminating 

20 DNA prior to PCR amplification. 

27. Use of one or more DNA glycosylases as defined in 
any one of claims 1 to 14 in DNA modification. 

25 28. A method of killing cells, comprising the steps of 
introducing a DNA glycosylase as defined in any one of 
claims 1 to 17 into a cell and expressing said DNA 
glycosylase in the cell to an extent which results in 
the killing of that cell. 



29. A method of performing enzymatic DNA sequencing to 
determine the position of cytosine and/or thymine bases 
by treating said DNA with at least one CDG and/or TDG 
as defined in any one of claims 1 to 14. 



WO 97/25416 



PCT/GB97/00057 



1/7 




Figure 1 
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Figure 3 




Figure 4 




Figure 5 
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hUNGl MGVFCI*GPWGI>GRKIjRTPGKGPIsQI*IjSRIjC OVHttQK 3 6 

mUNGl MGV LQRRSJUR — ImARRAGXmR&I* TPNPDSVSRQK 31 

hUNG2 MXGQKTJjYSFFSPSPARKRHAPSPBPAVQGTGVAGVPZBSGDAAA 4 5 

mUNG2 MTGQKTL YSFFSPTP TGKR TTR SPEP - VPQSGVAA — BXGGDAVA 4 2 

hUNG2 / 1 X P AKKAPAGQEE P GTF P SSPL S AEQIjDRI QRNKAAALLRIiAARNV 9 0/81 

mUNG2/l S P AKKAR VE QKEQG SPLSAEQLVRIQRNKAAALLRIiAARNV 8 3/72 



hUNG2/l 
mUNG2/l 

hUNG2/l 
mUNG2/l 



PVGFGESWKKHLSGEFGKPYFIKLMGFVAEERKHYTVYPPPHQVF 
PAGFGESWKQQLCGEFGKPYFVKLMGFVAEERNHHKVYPPPEQVF 



TWTQMC D I KDVKWI LGQDPYHG PNQAHG LCF S VQRPVP P PPS LE 
TWTQMCDIRDVKWILGQDPYHGPNQAHGLCFSVQRPVPPPPSLE 



135/126 
128/117 

180/171 
173/162 



hUNG2 / 1 NI YKELSTDIEDFVHPGHGDLSGWAKQGVLLLNAVLTVRAHQANS 225/216 
mUNG2 / 1 NIFKELSTDIDGFVHPGHGDLSGWARQGVLLLNAVLTVRAHQANS 218/207 



hUNG2 / 1 H KERG WEQFTD AVVS WI>NQNSNG LVFLLWG S YAQKKG S A I DRKRH 27 0/261 

mUNG2 / 1 HKERGWEQFTDA VVS WLNQNLSG LVFIjLWG S YAQKKG S VI DRKRH 2 63/252 

hUNG2 / 1 HVLQTAHPSPLSVYRGFFGCRHFSKTNELLQKSGKKPIDWKEL 313/304 

mUNG2 / 1 HVLQTAHPS PLS VYRGFLGCRHF S KANELLQKSGKKP INWKEL 306/295 



Figure 6 
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Figure 7 



